forsts: tacit collusion in the repeated non-cooperative games using forwarding n-steps reinforcement learning algorithm

Fa | Ar | En

forsts: tacit collusion in the repeated non-cooperative games using forwarding n-steps reinforcement learning algorithm


نویسنده	golzari hormozi amin ,khasteh hossein ,nikoofard amir hossein ,shirmohammadi zahra
منبع	هوش محاسباتي در مهندسي برق - 2022 - دوره : 12 - شماره : 4 - صفحه:1 -12
چکیده	In the game theory, the wellknown solution to obtain the best profit in nonrepeated games as much as possible is the nash equilibrium. however, in some repeated noncooperative games, agents can achieve more profit than the nash equilibrium by tacit collusion. one of the methods to achieve profit more than nash equilibriums in tacit collusion is reinforcement learning. however, reinforcement learningbased methods consider only one step in the learning process. to achieve and improve profit in these games, more than one step can be used. in this regard, a learningbased forwarding nsteps algorithm called forwarding steps (forsts) is proposed in this paper. the main idea behind forsts is to improve the performance of agents in noncooperative games by observing the last nstep rewards. as forsts is used in the game theory to learn tacit collusion, it is evaluated by the iterated prisoner’s dilemma and the cournot market. prisoner’s dilemma is an example of a traditional game. the results show that in the iterated prisoner’s dilemma, the agents using forsts achieve better profit than the agents playing in the nash equilibrium. also, in the cournot electricity market, sum of the profit of agents using forsts is 3.614% more than the sum of profit of agents` playing in the nash equilibrium.
کلیدواژه	cournot ,electricity market ,nash equilibrium ,non-cooperative repeated games ,prisoner’s dilemma ,reinforcement learning
آدرس	k. n. toosi university of technology, faculty of computer engineering, iran, k. n. toosi university of technology, faculty of computer engineering, iran, k. n. toosi university of technology, faculty of electrical engineering, iran, shahid rajaee teacher training university, faculty of computer engineering, iran
پست الکترونیکی	shirmohammadi@sru.ac.ir

ForSts: Tacit Collusion in the Repeated Non-Cooperative Games Using Forwarding N-Steps Reinforcement Learning Algorithm

Authors	Golzari Hormozi Amin ,Khasteh Seyed Hossein ,Nikoofard Amir hossein ,Shirmohammadi Zahra
Abstract	In the game theory, the wellknown solution to obtain the best profit in nonrepeated games as much as possible is the Nash equilibrium. However, in some repeated noncooperative games, agents can achieve more profit than the Nash equilibrium by tacit collusion. One of the methods to achieve profit more than Nash equilibriums in tacit collusion is reinforcement learning. However, reinforcement learningbased methods consider only one step in the learning process. To achieve and improve profit in these games, more than one step can be used. In this regard, a learningbased forwarding Nsteps algorithm called Forwarding Steps (ForSts) is proposed in this paper. The main idea behind ForSts is to improve the performance of agents in noncooperative games by observing the last Nstep rewards. As ForSts is used in the game theory to learn tacit collusion, it is evaluated by the iterated prisoner’s dilemma and the Cournot market. Prisoner’s Dilemma is an example of a traditional game. The results show that in the iterated prisoner’s dilemma, the agents using ForSts achieve better profit than the agents playing in the Nash equilibrium. Also, in the Cournot electricity market, sum of the profit of agents using ForSts is 3.614% more than the sum of profit of agents` playing in the Nash equilibrium.
Keywords