comparing logistic regression and lightgbm in credit card fraud detection: a statistical approach using prediction uncertainty

Fa | Ar | En

comparing logistic regression and lightgbm in credit card fraud detection: a statistical approach using prediction uncertainty


نویسنده	mojab ramin
منبع	journal of money and economy - 2023 - دوره : 18 - شماره : 4 - صفحه:497 -509
چکیده	Relying on the area under the curve (auc) measure, we compare the performance of the logit regression model and the lightgbm algorithm. despite these methods being common in the literature, our study emphasizes the role of statistical inference to evaluate and compare the results comprehensively. we use the training set of the vesta (2023) dataset, provided by vesta—a global fraud prevention company headquartered in the united states specializing in payment solutions and risk management. originally released as part of a kaggle competition focused on credit card fraud detection, this dataset comprises diverse transaction records, representing a rich source for exploring advanced fraud detection methods. our analysis reveals that while the lightgbm algorithm generally yields higher predictive accuracy, the differences between the calculated aucs of the two methods are not statistically significant. this underscores the importance of using inferential techniques to validate model performance differences in fraud detection.
کلیدواژه	fraud detection ,financial institution ,credit card ,logit ,lightgbm ,machine learning
آدرس	monetary and banking research institute, department of banking, iran
پست الکترونیکی	rmojab@mbri.ac.ir

مقایسه رگرسیون لجستیک و lightgbm در تشخیص تقلب کارت اعتباری: یک رویکرد آماری با استفاده از عدم قطعیت پیش‌بینی

Authors	مجاب رامین
Abstract	با تکیه بر معیار area under the curve (auc)، عملکرد مدل رگرسیون لجستیک و الگوریتم lightgbm را مقایسه می‌کنیم. با وجود اینکه این روش‌ها در ادبیات رایج هستند، مطالعه ما بر نقش استنتاج آماری برای ارزیابی و مقایسه جامع نتایج تاکید دارد. ما از مجموعه آموزشی داده‌های vesta (2018) استفاده می‌کنیم که توسط vesta، یک شرکت جهانی پیشگیری از تقلب مستقر در ایالات متحده که در راه‌حل‌های پرداخت و مدیریت ریسک تخصص دارد، ارائه شده است. این مجموعه داده که در ابتدا به عنوان بخشی از یک مسابقه kaggle متمرکز بر تشخیص تقلب کارت اعتباری منتشر شد، شامل رکوردهای متنوعی از تراکنش‌ها است که منبع غنی برای بررسی روش‌های پیشرفته تشخیص تقلب را فراهم می‌کند. تحلیل ما نشان می‌دهد که در حالی که الگوریتم lightgbm به طور کلی دقت پیش‌بینی بالاتری دارد، تفاوت‌های بین aucهای محاسبه شده دو روش از نظر آماری معنادار نیستند. این موضوع اهمیت استفاده از تکنیک‌های استنتاجی برای اعتبارسنجی تفاوت‌های عملکرد مدل در تشخیص تقلب را برجسته می‌کند.
Keywords	کشف تقلب، مؤسسه مالی، کارت اعتباری، رگرسیون انتخاب گسسته، لاجیت، یادگیری ماشین