تشخیص و کاهش خرابی ساکت داده براساس پیش بینی نرخ رخداد خرابی بدون تزریق اشکال

Fa | Ar | En

تشخیص و کاهش خرابی ساکت داده براساس پیش بینی نرخ رخداد خرابی بدون تزریق اشکال


نویسنده	یخچی مونا ,فاضلی مهدی ,اصغری توچائی امیر
منبع	سامانه هاي پردازشي و ارتباطي چند رسانه اي هوشمند - 1401 - دوره : 3 - شماره : 4 - صفحه:1 -13
چکیده	خرابی ساکت داده (sdc) به طور جدی قابلیت اطمینان یک سیستم را به مخاطره می‌اندازد. رویکردهای فعلی با استفاده از یادگیری ماشین نرخ رخداد sdc برای هر دستورالعمل‌ را پیش بینی می‌کنند. در حالی‌که اکثر آنها فاقد دقت مناسب و نیازمند مجموعه داده برای آموزش هستند و به دلیل مصرف منابع زیاد دستیابی به آنها دشوار است. از سوی دیگر نرخ رخداد اشکالات چندبیتی در قطعات نیمه هادی افزایش چشمگیری داشته اند. لذا تشخیص دستورات آسیب پذیر در حضور اشکال اهمیت یافته است. اما خلاء تحقیقات موجود عدم وجود یک روش نرم افزاری با دقت بالا بدون نیاز به تزریق اشکال است؛ به طوریکه تشخیص اشکال در sdc با منشاء داده و دستورالعمل مورد بررسی قرار بگیرد. بدین منظور، در این پژوهش با محاسبه نرخ رخداد sdc برای هر دستورالعمل‌ها، مدل درخت تصمیم گیری m5rule پیشنهاد گردیده است. سپس از روش تشخیص خطا، با کپی کردن دستورالعمل های حیاتی بوسیله مرتب سازی استفاده شده و در نهایت مدل ارائه شده بر روی معیار mibench با برنامه‌های آزمایشی متعدد ارزیابی گردیده است. نتایج ارزیابی نشان می‌دهد روش ارائه شده در مقایسه با سایر روش‌های پیشرفته به دقت تشخیص بهتری با سربار در حدود 99 درصد برای 58 درصد نرخ پوشش sdc رسیده است.
کلیدواژه	خرابی ساکت داده، تزریق اشکال، خطاهای نرم، خطاهای چند بیتی، یادگیری ماشین
آدرس	دانشگاه آزاد اسلامی واحد بروجرد, گروه کامپیوتر, ایران, دانشگاه هالمستاد, دانشکده فناوری اطلاعات, گروه کامپیوتر, سوئد, دانشگاه صنعتی خوارزمی, دانشکده مهندسی برق و کامپیوتر, گروه کامپیوتر, ایران
پست الکترونیکی	asghari@khu.ac.ir

sdc-causing error detection and mitigation based on failure rate prediction without fault injection

Authors	yakhchi moona ,fazeli mahdi ,asghari toochai amir
Abstract	introduction: reducing the size of processing components and increasing the probability of failure even in ordinary components maintaining reliability has become a serious challenge of today’s computer systems. the soft errors can lead to silent data corruption which seriously compromises the reliability of a system. the silent data corruption is a fault that affects running software and leads to incorrect output. detecting silent data corruption needed a profile of the instructions causing the silent data corruption to decide which instructions to be protected. current approaches by machine learning algorithms predict the occurrence rate of silent data corruption for each instruction. while most of the existing algorithms suffer from inaccuracy. most current detection techniques require sufficient data from fault injection for training, which is difficult to achieve due to high resource consumption, such as execution time and code size costs. however, as technology is downscaling toward nano-scale sizes, multiple-bit soft errors are emerging as an important reliability challenge. therefore, identifying and determining vulnerable points in the presence of fault has so important.method: traditional solutions based on redundancies are very expensive in terms of chip area, energy consumption, and performance. consequently, providing low cost and efficient approaches to cope with sdcs has received researchers’ attention more than ever. hence the lack of a high-precision method without fault injection becomes a research challenge. utilizing fault injection methods in complex systems is costly; therefore, in identifying silent data corruptions, a method based on machine learning algorithm is used, which is not necessary to inject fault in all software. multi-bit faults and silent data corruptions with instruction sources are also considered. for this goal, we have proposed the m5rule decision tree model to detect the silent data corruption error by calculating the importance of the instruction feature for the vulnerability. then we have used the error detection method by copying the critical instructions with sort.results: finally, we evaluated our model on mibench benchmarks with multiple test programs. the results show an overhead of 58 % with data silent corruption coverage rate of about 99%.discussion: in order that we not only did the single-bit fault consider but also multiple-bit fault. in addition, fault had been injected into instruction and data. consequently, the evaluation results show that our method achieves a better detection accuracy compared to other state-of-the-art methods.
Keywords	silent data corruption ,fault injection ,soft errors ,multi-bit fault ,machin learning