بیشینه سازی امتیاز در بازی تصادفی match-3 با استفاده از یادگیری تقویتی عمیق

Fa | Ar | En

بیشینه سازی امتیاز در بازی تصادفی match-3 با استفاده از یادگیری تقویتی عمیق


نویسنده	افروغ علی ,رعایائی اردکانی مهدی
منبع	پردازش علائم و داده ها - 1402 - شماره : 4 - صفحه:129 -140
چکیده	بازی‌های رایانه‌ای در سال ‌های اخیر نقش مهمی در پیشرفت هوش مصنوعی داشته‌اند. بازی‌ها به عنوان محیطی مناسب برای آزمون و خطا، آزمایش ایده‌ها و الگوریتم‌های مختلف هوش مصنوعی مورد استفاده قرار گرفته‌اند. بازی match-3 یک سبک از بازی‌های محبوب در تلفن‌های همراه است که از فضای حالت تصادفی و بسیار بزرگ تشکیل شده که عمل یادگیری در آن را دشوار می‌کند. در این مقاله یک عامل هوشمند مبتنی بر یادگیری تقویتی عمیق ارائه می‌شود که هدف آن بیشینه‌سازی امتیاز در بازی match-3 است. در تعریف عامل پیشنهادی از نگاشت فضای عمل، حالت و همچنین ساختار شبکه عصبی مبتکرانه‌ای برای محیط بازی match-3 استفاده می‌شود که توانایی یادگیری حالت‌های زیاد را داشته باشد. مقایسه روش پیشنهادی با سایر روش‌های موجود از جمله روش‌ یادگیری تقویتی مبتنی بر سیاست، روش یادگیری تقویتی مبتنی بر ارزش، روش‌های حریصانه و عامل انسانی نشان از عملکرد برتر روش پیشنهادی در بازی match-3 دارد.
کلیدواژه	یادگیری تقویتی عمیق، بازی تصادفی، فضای حالت بزرگ، match-3
آدرس	دانشگاه تربیت مدرس, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه تربیت مدرس, دانشکده مهندسی برق و کامپیوتر, ایران
پست الکترونیکی	mroayaei@modares.ac.ir

maximize score in stochastic match-3 games using reinforcement learning

Authors	afrougheh ali ,roayaei ardakany mehdy
Abstract	computer games have played an important role in the development of artificial intelligence in recent years. throughout the history of artificial intelligence, computer games have been a suitable test environment for evaluating new approaches and algorithms to artificial intelligence. different methods, including rule-based methods, tree search methods, and machine learning methods (supervised learning and reinforcement learning) have been developed to create intelligent agents in different games. games have been used as a suitable environment for trial and error, testing different artificial intelligence ideas and algorithms. among these researches, we can mention the research of deep blue in the game chess and alphago in the game go. alphago is the first computer program to defeat an expert human go player. also, deep blue is a chess-playing expert system is the first computer program to win a match, against a world champion. in this paper, we focus on the match-3 game. the match-3 game is a popular game in cell phones, which consists of a very large random state space that makes learning difficult. it also has random reward function which makes learning unstable. many researches have been done in the past on different games, including match-3. the aim of these researches has generally been to play optimally or to predict the difficulty of stages designed for human players. predicting the difficulty of stages helps game developers to improve the quality of their games and provide a better experience for users. based on the approach used, past works can be divided into three main categories including search-based methods, machine learning methods and heuristic methods. in this paper, an intelligent agent based on deep reinforcement learning is presented, whose goal is to maximize the score in the match-3 game. reinforcement learning is one of the approaches that has received a lot of attention recently. reinforcement learning is one of the branches of machine learning in which the agent learns the optimal policy for choosing actions in different spaces through its experiences of interacting with the environment. in deep reinforcement learning, reinforcement learning algorithms are used along with deep neural networks.in the proposed method, different mapping mechanisms for action space and state space are used. also, a novel structure of neural network for the match-3 game environment has been proposed to achieve the ability to learn large state space. the contributions of this article can be summarized as follow. an approach for mapping the action space to a two-dimensional matrix is presented in which it is possible to easily separate valid and invalid actions. an approach has been designed to map the state space to the input of the deep neural network, which reduces the input space by reducing the depth of the convolutional filter and thus improves the learning process. the reward function has made the learning process stable by separating random rewards from deterministic rewards.the comparison of the proposed method with other existing methods, including ppo, dqn, a3c, greedy method and human agents shows the superior performance of the proposed method in the match-3 game.
Keywords	deep reinforcement learning ,random game ,match-3 ,large state space