طراحی شبکه بهینه ‌ساز خودکار الهام گرفته از الگوریتم بهینه‌ سازی bfgs با حافظه محدود

Fa | Ar | En

طراحی شبکه بهینه ‌ساز خودکار الهام گرفته از الگوریتم بهینه‌ سازی bfgs با حافظه محدود


نویسنده	اعتصام محمد ,صادقی لطف آبادی اشکان ,غیاثی شیرازی کمال الدین
منبع	پردازش علائم و داده ها - 1403 - شماره : 1 - صفحه:89 -101
چکیده	امروزه برخلاف توسعه مدل های یادگیری ماشین برای استخراج ویژگی ها به صورت خودکار، هنوز الگوریتم های بهینه سازی به صورت دستی طراحی می شوند. یکی از اهداف فرایادگیری (meta-learning)، خودکار کردن فرایند بهینه‌سازی است. الگوریتم‌های بهینه‌سازی دستی مبتنی بر بردار گرادیان تنها براساس عملیات ضرب داخلی، ضرب اسکالر و جمع برداری بر روی بردارهای ورودی نوشته می شوند. بنابراین می توان گفت که این الگوریتمها در فضای هیلبرت بعد مساله بهینه‌سازی اجرا می شوند. ما نیز قصد داریم با ایده گرفتن از این مطلب، فضایی برای یادگیری ورودی‌ها ایجاد کنیم که مستقل از ابعاد ورودی باشد. بدین منظور با ایده گرفتن از الگوریتم bfgs با حافظه محدود (l-bfgs) و همچنین سلول lstm یک ساختار جدید با نام hilbert lstm (hlstm) معرفی می‌کنیم که فرایند یادگیری در آن مستقل از ابعاد ورودی انجام می‌شود. به عبارتی الگوریتم یادگیری در فضای هیلبرت مساله بهینه‌سازی اجرا می‌شود. برای رسیدن به این هدف از لایه ضرایب خطی استفاده می‌کنیم که ترکیب خطی بردارهای ورودی را محاسبه می‌کند و ضرایب این ترکیب خطی، با کمک ضرب داخلی بردارهای ورودی بدست می‌آید. آزمایش‌های ما نشان می‌دهند که نتایج به‌دست آمده توسط بهینه‌ساز ارائه شده، به مراتب بهتر از نتایج الگوریتم‌های بهینه‌سازی دستی است.
کلیدواژه	فرایادگیری، بهینه‌سازی خودکار، hilbert lstm ، lstm ، l-bfgs
آدرس	دانشگاه فردوسی مشهد, گروه مهندسی کامپیوتر, ایران, دانشگاه فردوسی مشهد, گروه مهندسی کامپیوتر, ایران, دانشگاه فردوسی مشهد, گروه مهندسی کامپیوتر, ایران
پست الکترونیکی	k.ghiasi@um.ac.ir

designing l-bfgs inspired automatic optimizer network

Authors	etesam mohammad ,sadeghi-lotfabadi ashkan ,ghiasi-shirazi kamaledin
Abstract	the optimization of deep neural networks based on mini-batches is an active area of research, and the improvements in this field have a great impact on the success of using deep neural networks in practical problems with large data. in the last decade, algorithms such as rmsprop, adagrad, and adam have been devised for the optimization of neural networks on mini-batches, having a great impact on making the training of neural networks easier. the common feature of all these methods is that they are applied to each dimension of the gradient vector separately, and therefore the optimization of each parameter of the neural network is done independent of other parameters. next, researchers tried to learn these algorithms automatically and devised methods for learning to optimize, which is a type of meta-learning. optimization learning algorithms use an optimizer network to obtain the optimization direction at each step. therefore, when these methods are used to optimize a neural network, we have an optimizee network whose parameters we want to learn and an optimizer network whose parameters are meta-learned. the optimizer network receives the previous parameters and their gradients and suggests a new direction to optimize the optimizee network. similar to the optimization algorithms based on mini-batches, all the methods devised so far for learning optimization also have this point in common that they apply the optimization method independently to each parameter. the fact that the gradient direction is not a suitable direction for optimization is also accepted in mathematical optimization algorithms, and usually, if there is no computational issue, the gradient direction is corrected by using the inverse of the hessian matrix or its approximation by newton or quasi-newton methods. therefore, we see that neural network optimization algorithms and non-linear mathematical optimization algorithms are common in not using the gradient direction.
Keywords	hilbert lstm ,lstm ,l-bfgs ,meta-learning ,automatic optimization