توسعه یادگیری تقویتی پیوسته در مسائل مکانی توزیع یافته (مورد مطالعاتی: کنترل هوشمند چراغ های راهنمایی)

Fa | Ar | En

توسعه یادگیری تقویتی پیوسته در مسائل مکانی توزیع یافته (مورد مطالعاتی: کنترل هوشمند چراغ های راهنمایی)


نویسنده	اصلانی محمد ,مسگری محمدسعدی
منبع	مهندسي برق و الكترونيك ايران - 1399 - دوره : 17 - شماره : 3 - صفحه:63 -78
چکیده	سامانه های چند عامله به عنوان شاخه ای از هوش مصنوعی در سال های اخیر به عنوان یک نگرش برای مطالعه، بررسی و تحلیل پدیده هایی که دارای خصوصیاتی همچون توزیع یافتگی، پیچیدگی، پایین به بالا بودن و پویایی هستند در زمینه های مختلف از جمله ترافیک، حمل و نقل، اقتصاد، محیط زیست و مواردی از این دست به طور گسترده بکار گرفته شده اند. چالش اصلی در سامانه های چند عامله بدست آوردن رفتار مناسب برای تک تک عامل ها برای رسیدن به رفتار سطح بالای بهینه برای کل سامانه است. یادگیری تقویتی به عنوان رویکردی مناسب که به صورت خودکار و تدریجی می تواند رفتار بهینه را برای تمام عامل ها در تعامل با محیط بدست آورد،برای حل این چالش مناسب است. در یادگیری تقویتی عامل ها در طول زمان از طریق تعامل با محیط یاد میگیرند که در شرایط (حالات) مختلف چه اعمالی را انجام دهند که منجر به دریافت بیشترین سود شود. روش های رایج یادگیری تقویتی در مسائل دنیای واقعی که دارای تعداد حالات محیط بسیار بالا یا بی نهایت هستند عملکرد مناسبی ندارند زیرا این روش ها مقداری مجزا را برای ارزش هر زوج حالتعمل در حافظه اختصاص می دهند وعامل برای بدست آوردن مقدار دقیق ارزش هر زوج حالتعمل نیاز دارد که به دفعات ارزش آنها را مشاهده نماید. نوآوری تحقیق حاضر،حل چالش فوق از طریق یادگیری تقویتی پیوسته در مسائل مکانی با فضای حالتعمل بزرگ و پیوسته است. در رویکرد یادگیری تقویتی پیوسته از مفهوم تعمیم برای تخمین ارزش حالتعمل استفاده می شود. در این روش عامل نیازی به تجربه اندوزی مستقیم در تمام حالات محیط را ندارد و ارزش یک حالت با شباهت سنجی از ارزش سایر حالات مشابه، تخمین زده می شود. این روش ها برای شباهت سنجی نیاز به کد گذاری حالات محیط دارند که در این تحقیق ناحیه بندی فضا که دارای حجم محاسباتی پایینی است مورد استفاده قرار گرفت. در این تحقیق کنترل ترافیک (به طور خاص مدیریت چراغ های راهنمایی) که دارای پویایی و پیچیدگی بالایی است به عنوان مورد مطالعاتی مطلوب انتخاب شد.
کلیدواژه	سامانه های چند عامله، یادگیری تقویتی پیوسته، ناحیه بندی فضا و کنترل ترافیک.
آدرس	دانشگاه صنعتی خواجه نصیرالدین طوسی, دانشکده نقشه برداری, ایران, دانشگاه صنعتی خواجه نصیرالدین طوسی, دانشکده نقشه برداری, ایران
پست الکترونیکی	mesgari@kntu.ac.ir

Developing Continuous Reinforcement Learning in Distributed Spatial Problems (Case Study: Adaptive Traffic Control)

Authors	Saadi Mesgari Mohammad
Abstract	The Multiagent systems has shown their usefulness as an efficient approach for modeling, analyzing as well as implementing complex, dynamic and distributed applications such as robotic teams, distributed control, resource management, traffic control, land use planning, crisis management, forest fire control and to name but a few. The main challenge in multiagent systems is to find the suitable behavior for each agent that maximize the average utility rate of the whole system. Moreover, it is sometimes necessary that they learn new behaviors online, such that the performance of the whole system gradually improves. Thus, a learning mechanism is necessary so that agents gradually find the global optimal solution on their own. In this context, reinforcement learning as a promising approach for training agents could be useful such that the agent never sees examples of correct behavior but instead receives positive or negative rewards for the actions it tries. Thus, it allows agent to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Each time the agent performs an action in its environment, a trainer may provide a reward to indicate the desirability of the resulting state and the agent tries to learn a control policy which is a mapping from states to actions that maximizes the expected sum of the received rewards. Continuous reinforcement learning algorithms which use generalization, the ability of a system to perform accurately on unseen data, perform properly in realworld problems. In practical point of view, there is a natural metric on the state space such that close states exhibit similar behavior so that the agents are able to deal with states never exactly experienced before and they can learn efficiently by generalizing from previously (similar, close) experienced states. The success of continuous reinforcement learning algorithms on realworld problems hinges on effective function approximator which maps states to values via a parameterized function. Among the many function approximator schemes proposed, tile coding which forms a piecewiseconstant approximation of the value function and strikes an empirical balance between representational power and computational cost is applied in this research. The focus of this paper is to combine multiagent systems with continuous state reinforcement learning by using tile coding.The proposed approached was validated using traffic signal control, in which traffic lights located at intersections can be seen as autonomous agents that learn while interacting with the environment. There are some challenging issues in traffic signal control such as high number of agents, nonstationarity of the multiagent learning problem, the curse of dimensionality and continuity in state space which makes it as a suitable testbed. The reinforcement learning controller is benchmarked against optimized pretimed control. The results indicate that reinforcement learning agent achieves 21% less stop time compared to optimized pretimed control.
Keywords	Single-phase asynchronous motor ,SPIM ,Rotor field oriented control ,Speed estimation ,Extended Kalman filter ,EKF ,On-line rotor resistance estimation