نهان‌کاوی فایل‌های فشرده صوتی با استفاده از یادگیری ماشین

Fa | Ar | En

نهان‌کاوی فایل‌های فشرده صوتی با استفاده از یادگیری ماشین


نویسنده	سلیمانی محسن ,چهل امیرانی مهدی ,کبودیان جهانشاه
منبع	پردازش علائم و داده ها - 1403 - شماره : 2 - صفحه:55 -66
چکیده	علم پنهان‌سازیِ پیام حاوی اطلاعات در یک رسانه حامل را نهان‌نگاری و تلاش برای تشخیص وجود یا نبود پیام نهان‌شده در شیء پوششی را تحلیل نهان‌نگاری یا نهان‌کاوی می‌نامند. فرمت فشرده‌سازی mp3 در میان داده‌های صوتی به‌عنوان میزبانی مناسب و فراگیر برای نهان‌نگاری اطلاعات مورداستفاده قرار گرفته و شیوه‌های نهان‌نگاری مختلفی برای این منظور طراحی شده‌اند؛ در این پژوهش، هدف ارائه الگوریتمی برای نهان‌کاوی به‌طور خاص برای فایل‌های فشرده صوتی با قالب mp3 است که با نرم‌افزارmp3stego نهان‌نگاری شده‌اند. برای تهیه دادگان نهان‌نگاری از فایل‌های متنی با متون تصادفی استفاده شده‌است. ابتدا با استفاده از اطلاعات جانبیِ مستخرج از فایل‌های mp3، ویژگی‌های لازم استخراج‌شده و دادگان صوتی که شامل دو دسته فایل‌های نهان‌نگاری‌شده و فایل‌های نهان‌نگاری‌نشده است، به دو بخش دادگان آموزش و دادگان آزمون تقسیم شده و در ادامه با استفاده از روش‌های یادگیری ماشین (ماشین بردار پشتیبان)، سامانه تشخیص فایل‌های آلوده و فایل‌های تمیز طراحی شده و درنهایت کارایی سامانه با استفاده از دادگان آزمون اندازه‌گیری می‌شود. در این مقاله، یک ویژگی جدید به‌نام قله‌داربودن طیف (spk) از اطلاعات جانبی فایل mp3 استخراج می‌شود. سامانک پیشنهادی با استفاده از دادگان جداگانه آزمون که شامل فایل‌های تمیز و فایل‌های نهان‌نگاری‌شده با ظرفیت‌های نهان‌نگاری متنوع است، آزمایش شده و با دقت 100% و بدون خطا فایل‌های تمیز و آلوده را از هم متمایز می‌کند. نتایج حاصل حاکی از شناسایی دقیق موارد نهان‌نگاری‌شده درعین کاهش پیچیدگی محاسباتی و افزایش سرعت این نوع نهان‌کاوی نسبت به روش‌های ابداعی گذشته است.
کلیدواژه	فایل فشرده صوتی، نهان‌کاوی فایل‌های صوتی، نهان‌نگاری در فایل‌های صوتی، mp3stego ,mp3
آدرس	دانشگاه ارومیه, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه ارومیه, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه رازی, دانشکده مهندسی برق و کامپیوتر, ایران
پست الکترونیکی	jkabudian@gmail.com

steganalysis of compressed audio files based on machine learning

Authors	soleimani mohsen ,chehel amirani mahdi ,kabodian jahanshah
Abstract	the science of hiding a message containing information in a carrier medium is called steganography, and the attempt to detect the presence or absence of a hidden message in a cover medium is called steganalysis. the mp3 compression format has been used among audio data as a suitable and comprehensive host for information encryption, and various encryption methods have been designed for this purpose. in this research, the aim is to present an algorithm for audio ateganalysis, specifically for compressed audio files in mp3 format, in which some data has been embedded using mp3stego software. to prepare encrypted data, text files with random texts have been used. first, by using the side information extracted from mp3 files, the necessary features are extracted and the audio data, which includes two categories of stego files and clean files, is divided into two parts: training data and test data. and then, using machine learning techniques (support vector machine), the detection system of infected files and clean files is designed, and finally, the efficiency of the system is measured using the test data. in this paper, a new feature called spectral peakiness (spk) is extracted from the side information of mp3 file. the proposed system was tested using separate test data, which includes clean files and stego files with various encryption capacities, and it distinguished clean and stego files with 100% accuracy and without error. the results indicate the perfect classification of stego and clean files while reducing the computational complexity and increasing the speed of steganalysis compared to other methods.instead of using the audio signal information stored in the mp3 file, the proposed method uses the side information of the mp3 file, which is less dependent on the audio content of the file. in this method, the mdb side information in the compressed audio file is assumed as a sequence, and then, using a feature extraction method, a new feature in the frequency domain called spectral peakiness is calculated. this simple yet powerful feature is combined with features such as temporal average and spectral average of the mdb sequence and forms a low-dimensional (three-dimensional) feature vector. this feature vector will then be classified by a support vector machine (svm) classifier as a suspicious file or a normal file. the feature extraction method, while being simple and having very few calculations, has 100% accuracy (recognition without any error) for mp3 files, even when the amount of the hidden information in the audio file is very low.
Keywords	compressed audio file ,audio steganography ,audio steganalysis ,mp3 ,mp3stego