دستیابی به همکاری از طریق یادگیری تقویتی چند عاملی در معمای زندانی تکرارشونده

Fa | Ar | En

دستیابی به همکاری از طریق یادگیری تقویتی چند عاملی در معمای زندانی تکرارشونده


نویسنده	فرزانه سمیرا ,زندی فرشته ,سلیمی سرتختی جواد
منبع	محاسبات و سامانه هاي توزيع شده - 1399 - دوره : 3 - شماره : 2 - صفحه:12 -21
چکیده	امروزه معمای زندانی یکی از مسائل اولیه و مهم در نظریه بازی ها است. در این معما نقطه تعادل نشی وجود دارد و چنانچه عامل ها منطقی رفتارکنند در آن نقطه بازی می کنند؛ بدین منظور عامل ها بر ای دستیابی به سود بیشتر از بین دو عمل همکاری و عدم همکاری، عدم همکاری را انتخابمیکنند. در حالیکه برای عامل ها نقطه بهتری نسبت به نقطه نش وجود دارد و آن هم این است که هر دو عامل همکاری را انتخاب کنند. بنابراین،در جهت افزایش میزان همکاری عامل ها معمای زندانی به صورت معمای زندانی تکرارشونده با یک رویکرد یادگیری تقویتی در نظر گرفته شده است.نتایج مقاله نشان دهنده این است که رویکرد مورد نظر سبب افزایش میزان همکاری عامل ها شده است و اگر عاملی همکاری را پیشه کند عامل دیگرنیز همکاری را انتخاب می کند و بالعکس.
کلیدواژه	عدم همکاری متقابل، معمای زندانی تکرارشونده، یادگیری تقویتی، همکاری متقابل، lstm
آدرس	دانشگاه کاشان, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه کاشان, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه کاشان, دانشکده مهندسی برق و کامپیوتر, ایران
پست الکترونیکی	salimi@kashanu.ac.ir

achieving cooperation through multi agent reinforcement learning in iterated prisoner's dilemma

Authors	farzaneh samira ,zandi fereshteh ,salimi sartakhti javad
Abstract	nowadays, the prisoner’s dilemma is one of the primary and important issues in game theory. in this dilemma, there is a nash equilibrium, and if the agents behave rationally, they play at point; for this purpose, the agents choose defection between the two actions of cooperation and defection to achieve greater profit. however there is a better point for the agents than the nash equilibrium, it is that both agents choose the cooperation. however there is a better point for the agents than the nash equilibrium, it is that both agents choose the cooperation. therefore, in order to increase the rate of cooperation of the agents, the prisoner's dilemma has been considered as iterated prisoner's dilemma with a reinforcement learning approach. the results of the article show that the desired approach let has increased the rate of cooperation of the agents, and if one agent choose the cooperation, the other agent also chooses cooperation and vice versa.
Keywords	mutual defection ,iterated prisoner’s ,dilemma ,reinforcement learning ,mutual cooperation ,lstm(long short term memory)