تعیین دوز بهینه دارو برای کنترل جعیت سلول‌های سرطانی با لحاظ اثرات زیان‌بار دارو در بیمار مبتلا به ملانوما با استفاده از روش مسیرهای شایستگی

Fa | Ar | En

تعیین دوز بهینه دارو برای کنترل جعیت سلول‌های سرطانی با لحاظ اثرات زیان‌بار دارو در بیمار مبتلا به ملانوما با استفاده از روش مسیرهای شایستگی


نویسنده	کلهر الناز ,نوری امین ,صبوری راد سارا ,صدرنیا محمد علی
منبع	رايانش نرم و فناوري اطلاعات - 1400 - دوره : 10 - شماره : 1 - صفحه:72 -92
چکیده	هدف اصلی در این مقاله، تعیین میزان بهینه دوز دارو برای کاهش جمعیت سلول‌های سرطانی در بیماران مبتلا به سرطان ملانوما می‌باشد. برای این کار از روش مسیرهای شایستگی که یکی از روش‌های حل مسئله یادگیری تقویتی می‌باشد، استفاده شده است. این روش مزایای دو روش مرسوم یادگیری تقویتی شامل یادگیری تفاوت گذرا و مونت کارلو را دارا می‌باشد. از دیگر مزایای این روش می‌توان به بی‌نیاز بودن آن به مدل ریاضی اشاره کرد ولی چون امکان پیاده‌سازی بر روی سیستم واقعی امکان پذیر نبوده است، برای بررسی عملکرد کنترلر پیشنهادی از مدل ریاضی غیرخطی تاخیردار جهت شبیه‌سازی رفتار محیط استفاده گردیده است. با توجه به بررسی‌‌هایی که تاکنون انجام شده است،لازم به ذکر می‌باشد که بر روی این مدل ریاضی هیچ نوع روش کنترلی پیاده‌سازی نشده است و این اولین باری می‌باشد که کنترل جمعیت سلول‌های سرطانی برای این مدل انجام گرفته است. در کنترل بهینه دوز دارو، میزان دارو می‌بایست به گونه‌ای باشد تا از اثرات زیان‌بار دارو بر روی سلول‌های سالم تا حد امکان جلوگیری شود. با توجه به نتایج حاصل از شبیه‌سازی، مشاهده می‌شود که روش انتخابی توانسته است با تزریق زیر بهینه‌ میزان دوز دارو، جمعیت سلول‌های سرطانی را کنترل کرده، کاهش داده و به صفر برساند که این امر، در کنار افزایش سلول‌های ایمنی بدن رخ داده است. در انتها برای نشان دادن مزیت روش انتخابی در افزایش سرعت برای کاهش سلول‌های سرطانی، این روش با روش الگوریتم یادگیری q که یکی دیگر از روش‌های حل مسئله یادگیری تقویتی می‌باشد و روش کنترل بهینه مقایسه شده است. با اعمال عیب به سنسور سیستم نیز، عملکرد کنترلر پیشنهادی برای کاهش سلول‌های سرطانی در حضور عیب مورد بررسی قرار گرفت. برای بررسی یکی از مزایای روش یادگیری تقویتی که تطبیق‌پذیری آن با محیط می‌باشد، با لحاظ عدم قطعیت در پارامترهای سیستم و شرایط اولیه، کنترل جمعیت سلول‌های سرطانی در پنج بیمار مبتلا به سرطان ملانوما انجام شده است. همچنین سرعت همگرایی هر دو روش مسیرهای شایستگی و الگوریتم یادگیری q در کاهش سلول‌های سرطانی به ازای نرخ‌های آموزش مختلف مورد بررسی قرار گرفته است.
کلیدواژه	اثرات زیان‌بار دارو، الگوریتم یادگیری q، کنترل جمعیت سلول‌های سرطانی، ملانوما، یادگیری تقویتی، مسیرهای شایستگی، کنترل بهینه،سرعت همگرایی
آدرس	دانشگاه سجاد, دانشکده برق و مهندسی پزشکی, ایران, دانشگاه سجاد, دانشکده برق و مهندسی پزشکی, ایران, دانشگاه علوم پزشکی مشهد, دانشکده پوست, ایران, دانشگاه صنعتی شاهرود, دانشکده مهندسی برق و رباتیک, ایران
پست الکترونیکی	masadrnia@shahroodut.ac.ir

Using Eligibility Traces Algorithm to Specify the Optimal Dosage for the Purpose of Cancer Cell Population Control in Melanoma Patients with a Consideration of the Side Effects

Authors	Kalhor Elnaz ,Noori Amin ,Saboori Rad sara ,Sadrnia Mohammad Ali
Abstract	This paper mainly aims to determine the optimal drug dosage for the purpose of reducing the population of cancer cells in melanoma patients. To do so, Reinforcement Learning method and the eligibility traces algorithm are employed, giving us the advantage of creating a compromise between the two algorithms of the reinforcement learning, being MonteCarlo and Temporal Difference. Furthermore, it can be said that using this approach, there was no need to employ a mathematical model in the whole process. However, as its implementation on the real system was not possible, a delayed nonlinear mathematical model is used to investigate the performance of the proposed controller and simulate the behavior of the environment. It should be noted this mathematical model made use of no control method. This is the first time that population control of cancer cells is applied and tested on this model. To know of the optimal dosage of the drug, it should be mentioned that the drug is required to prevent the side effects on healthy/normal cells as much as possible. According to the obtained results, the eligibility traces algorithm is able to control and reduce the population of cancer cells through injecting the suboptimal drug dose. This will increase the level of immunity in our body. Finally, to demonstrate the advantage of a selective method of increasing the rate of cancer cell death, this method is compared with the Qlearning algorithm and optimal control. By applying the fault to the sensor, the performance of the proposed controller to reduce cancer cells was investigated. The adaptability of the proposed method with the environment changes is checked afterwards. To this end, uncertainty in the system parameters and initial conditions are applied and the population of cancer cells are controlled in five melanoma patients. Moreover, having added noise to the system, it was shown that the eligibility traces algorithm is able to control the population of cancer cells and make it reach zero. Additionally, the convergence speed of both eligibility traces algorithm and Q learning algorithm in reducing the number of cancer cells for different learning rates was investigated.
Keywords