بهسازی گفتار به‌کمک یادگیری واژه‌نامه مبتنی‌بر داده

Fa | Ar | En

بهسازی گفتار به‌کمک یادگیری واژه‌نامه مبتنی‌بر داده


نویسنده	مودتی سمیرا ,احدی محمد
منبع	پردازش علائم و داده ها - 1399 - شماره : 1 - صفحه:99 -116
چکیده	بهسازی گفتار یکی از پرکاربردترین حوزه‌ها در زمینه پردازش گفتار است. در این مقاله، یکی از روش‌های بهسازی گفتار مبتنی‌بر اصول بازنمایی تُنُک بررسی می‌شود. بازنمایی تُنُک این امکان را فراهم می‌سازد که عمده اطلاعات لازم برای بازنمایی سیگنال‌، براساس بُعد بسیار کمتری از پایه‌های فضایی اصلی قابل مدل‌سازی باشد. روش‌ یادگیری در این مقاله براساس تصحیح الگوریتم تطبیقی حریصانه مبتنی‌بر داده خواهد بود که واژه‌نامه در آن، به‌طور مستقیم از روی سیگنال داده و براساس شاخص تُنُکی مبتنی‌بر نُرم به منظور تطابق بیشتر میان اتم‌ها و ساختار داده آموزش می‌بیند. در این مقاله شاخص تُنُکی جدیدی براساس معیار جینی پیشنهاد می‌شود. همچنین محدوده پارامتر تُنُکی بخش‌های نوفه‌ای با توجه به فریم‌های ابتدایی گفتار تعیین و طی یک روال پیشنهادی در تشکیل واژه‌نامه مورد استفاده قرار می‌گیرد. نتایج بهسازی نشان می‌دهد که عملکرد روش پیشنهادی در انتخاب قاب‌‌های داده براساس معیار معرفی‌شده در شرایط نوفه‌ای مختلف بهتر از شاخص تُنُکی مبتنی‌بر نُرم و سایر الگوریتم‌های پایه در این راستا است.
کلیدواژه	بهسازی گفتار، بازنمایی تُنُک، یادگیری واژه‌نامه، مبتنی‌بر داده، تطبیقی حریصانه، شاخص تُنُکی جینی
آدرس	دانشگاه مازندران, دانشکده فنی و مهندسی, ایران, دانشگاه صنعتی امیرکبیر, دانشکده مهندسی برق, ایران
پست الکترونیکی	sma@aut.ac.ir

Speech Enhancement using Adaptive Data-Based Dictionary Learning

Authors	Mavaddati Samira ,Ahadi Mohammad
Abstract	In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques to attenuate the background noise without causing any distortion in the speech signal. In this paper, we focus on the single channel speech enhancement corrupted by the additive Gaussian noise. In recent years, there has been an increasing interest in employing sparse representation techniques for speech enhancement. Sparse representation technique makes it possible to show the major information about the speech signal based on a smaller dimension of the original spatial bases. The capability of a sparse decomposition method depends on the learned dictionary and matching between the dictionary atoms and the signal features. An over complete dictionary is yielded based on two main steps: dictionary learning process and sparse coding technique. In dictionary selection step, a predefined dictionary such as the Fourier basis, wavelet basis or discrete cosine basis is employed. Also, a redundant dictionary can be constructed after a learning process that is often based on the alternating optimization strategies. In sparse coding step, the dictionary is fixed and a sparse coefficient matrix with the low approximation error has been earned. The goal of this paper is to investigate the role of databased dictionary learning technique in the speech enhancement process in the presence of white Gaussian noise. The dictionary learning method in this paper is based on the greedy adaptive algorithm as a databased technique for dictionary learning. The dictionary atoms are learned using the proposed algorithm according to the data frames taken from the speech signals, so the atoms contain the structure of the input frames. The atoms in this approach are learned directly from the training data using the normbased sparsity measure to earn more matching between the data frames and the dictionary atoms. The proposed sparsity measure in this paper is based on Gini parameter. We present a new sparsity index using Gini coefficients in the greedy adaptive dictionary learning algorithm. These coefficients are set to find the atoms with more sparsity in the comparison with the other sparsity indices defined based on the norm of speech frames. The proposed learning method iteratively extracts the speech frames with minimum sparsity index according to the mentioned measures and adds the extracted atoms to the dictionary matrix. Also, the range of the sparsity parameter is selected based on the initial silent frames of speech signal in order to make a desired dictionary. It means that a speech frame of input data matrix can add to the first columns of the over complete dictionary when it has not a similar structure with the noise frames. The databased dictionary learning process makes the algorithm faster than the other dictionary learning methods for example Ksingular value decomposition (KSVD), method of optimal directions (MOD) and other optimizationbased strategies. The sparsity of an input frame is measured using Ginibased index that includes smaller measured values for speech frames because of their sparse content. On the other hand, high values of this parameter can be yielded for a frame involved the Gaussian noise structure. The performance of the proposed method is evaluated using different measures such as improvement in signaltonoise ratio (ISNR), the timefrequency representation of atoms and PESQ scores. The proposed approach results in a significant reduction of the background noise in comparison with other dictionary learning methods such as principal component analysis (PCA) and the normbased learning method that are traditional procedures in this context. We have found good results about the reconstruction error in the signal approximations for the proposed speech enhancement method. Also, the proposed approach leads to the proper computation time that is a prominent factor in dictionary learning methods.
Keywords	Speech enhancement ,Sparse representation ,Dictionary learning ,Data-Based learning ,Greedy adaptive