بهبود سرعت آموزش در مسائل یادگیری تقویتی مبتنی بر انتقال دانش عصبی‌فازی

Fa | Ar | En

بهبود سرعت آموزش در مسائل یادگیری تقویتی مبتنی بر انتقال دانش عصبی‌فازی


نویسنده	سعادت جو فاطمه ,قندهاری عرفان
منبع	مهندسي برق دانشگاه تبريز - 1398 - دوره : 49 - شماره : 3 - صفحه:1119 -1129
چکیده	این مقاله به موضوع انتقال یادگیری در محیط هایی که بعضی از ویژگی های آن مشترک است می پردازد. چالش اصلی در این مبحث، نحوه انتقال دانش به دست آمده از محیط مبدا به محیط مقصد است. در ایده ارائه شده با در نظر گرفتن ویژگی های مشترک در فضای عامل بین دو محیط، ابتدا مقدار ارزش عمل در محیط مبدا به دست می آید، سپس از یک شبکه عصبی فازی برای تقریب مقدار تابع ارزش عمل بهره برده می‏شود. در محیط مقصد، مقدار ارزش عمل از ترکیب مقدار پیش بینی شبکه عصبی فازی و مقدار به دست آمده در خود آن محیط استفاده می شود. به‌عبارت دیگر با توجه به آموزش انجام‌شده در محیط مبدا، مقادیر ارزش عمل در محیط مقصد از ترکیب مقادیر ارزش عمل تقریب زده شده توسط شبکه عصبی فازی و مقدار به دست آمده از الگوریتم یادگیری در آن محیط به دست می آید. شایان ذکر است که از الگوریتم یادگیری q در محیط استفاده‌شده است. نتایج حاصل از ایده ارائه‌شده، حاکی از افزایش چشمگیر سرعت یادگیری می باشد.
کلیدواژه	یادگیری تقویتی، انتقال دانش، ویژگی مشترک، شبکه عصبی- فازی
آدرس	دانشگاه علم و هنر, دانشکده مهندسی کامپیوتر, ایران, دانشگاه علم و هنر, دانشکده مهندسی کامپیوتر, ایران
پست الکترونیکی	erfan.ghandehari@sau.ac.ir

Improving the learning speed in reinforcement learning issues based on the transfer learning of neuro-fuzzy knowledge

Authors	Saadatjoo F. ,Ghandehari E.
Abstract	This paper to the topic of transfer learning in environments that share some of its features. The main challenge in this topic is how to transfer knowledge from the source environment to the target environment. In the presented idea, taking into account the common features in the operating space between the two environments, the value of the operation in the source environment first is obtained and then it uses a neuro fuzzy network to approximate the value of the value function of the operation. In the target environment, the value of the mode of operation is used to combine the predictive value of the neuro fuzzy network and the amount received in the environment itself. In other words, according to the training carried out in the source environment, valueaction values in the target environment are derived from the combination of valueaction values approximated by the neuro fuzzy network and the amount obtained from the learning algorithm in that environment. It is worth noting that the learning algorithm Q is used in the environment. The results of the proposed idea indicate a significant increase in learning speed.
Keywords