ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام

Fa | Ar | En

ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام


نویسنده	ابراهیمی آتانی رضا ,صادق پور مهدی
منبع	پردازش علائم و داده ها - 1397 - شماره : 3 - صفحه:31 -46
چکیده	با توسعه روزافزون خدمات دولت الکترونیکی، اطلاعات شخصی افراد در قالب پایگاه های داده در دستگاه ها و ارگان های دولتی و خصوصی ذخیره شده است. در بسیاری از موارد برای پردازش و استخراج دانش از این منابع داده بزرگ و با ارزش، نیاز به انتشار منابع داده و در اختیار گذاشتن اطلاعات به سایر نهادها و شرکت ها پدید می آید که این امر موجب ایجاد چالش های امنیتی در نقض حریم خصوصی افراد می شود. در این مقاله ضمن بررسی کامل پیشینه پژوهش، حفظ محرمانگی در انتشار داده ها، یک روش کارآمد برای گمنام سازی ارائه می شود که هدف آن حفظ دقت طبقه بندی روی داده های گمنام است. این روش با بهره گیری از درخت تصمیم از انتشار اطلاعاتی که تاثیر کمی بر سودمندی داده های خروجی دارد و حذف آن ها موجب تامین محرمانگی می شود، جلوگیری می کند. یکی از چالش های طرح هایی که از عمل گر گمنام سازی عمومی سازی استفاده می کنند، نیازمندی به ساخت درخت طبقه بندی برای هر شبه شناسه است که بیش تر به صورت خودکار صورت می گرفت. در طرح پیشنهادی نیازی به ساخت درخت طبقه بندی نیست. نتایج شبیه سازی و ارزیابی های انجام شده نشان می دهد، میان دقت الگوریتم های طبقه بندی که روی مجموعه داده استاندارد گمنام شده توسط این روش و مجموعه داده اولیه آموزش دیده اند، تفاوت اندکی وجود دارد.
کلیدواژه	حفظ محرمانگی، طبقه‌‌بندی، گمنام‌سازی، درخت تصمیم، عمل‌گر فرونشانی
آدرس	دانشگاه گیلان, دانشکده فنی, گروه مهندسی کامپیوتر, ایران, دانشگاه گیلان, دانشکده فنی, گروه مهندسی کامپیوتر, ایران
پست الکترونیکی	mehdi.sadeghpour@live.com

A New Privacy Preserving Data Publishing Technique Conserving Accuracy of Classification on Anonymized Data

Authors	Ebrahimi Atani Reza ,Sadeghpour Mehdi
Abstract	Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from these valuable sources, creates the need for sharing them with other organizations. This would bring security challenges in user rsquo;s privacy. The concept of privacy is described as sharing of information in a controlled way. In other words, it decides what type of personal information should be shared and which group or person can access and use it. ldquo;Privacy preserving data publishing rdquo; is a solution to ensure secrecy of sensitive information in a data set, after publishing it in a hostile environment. This process aimed to hide sensitive information and keep published data suitable for knowledge discovery techniques. Grouping data set records is a broad approach to data anonymization. This technique prevents access to sensitive attributes of a specific record by eliminating the distinction between a number of data set records. So far a large number of data publishing models and techniques have been proposed but their utility is of concern when a high privacy requirement is needed. The main goal of this paper to present a technique to improve the privacy and performance data publishing techniques. In this work first we review previous techniques of privacy preserving data publishing and then we present an efficient anonymization method which its goal is to conserve accuracy of classification on anonymized data. The attack model of this work is based on an adversary inferring a sensitive value in a published data set to as high as that of an inference based on public knowledge. Our privacy model and technique uses a decision tree to prevent publishing of information that removing them provides privacy and has little effect on utility of output data. The presented idea of this paper is an extension of the work presented in [20]. Experimental results show that classifiers trained on the transformed data set achieving similar accuracy as the ones trained on the original data set.
Keywords	Privacy preservation ,Data sharing ,Anonymization ,Classification ,Decision tree ,Suppression