ارائه یک روش جدید بهسازی گفتار بر مبنای یادگیری مدل ناهمدوس به‌کمک ضرایب تبدیل موجک

Fa | Ar | En

ارائه یک روش جدید بهسازی گفتار بر مبنای یادگیری مدل ناهمدوس به‌کمک ضرایب تبدیل موجک


نویسنده	مودّتی سمیرا
منبع	پردازش علائم و داده ها - 1399 - شماره : 3 - صفحه:17 -36
چکیده	بهسازی گفتار یکی از زمینه‌های پرکاربرد در پردازش سیگنال است که در حوزه‌های مختلفی مورد استفاده قرار می‌گیرد. در این مقاله از مفاهیم بازنمایی تُنُک و یادگیری واژه‌نامه به‌منظور حذف نوفه از سیگنال گفتار در فضای ویژگی تبدیل موجک استفاده می‌شود. ساختار مورد نیاز جهت بازنمایی هر مولفه از سیگنال به‌کمک مفاهیم بازنمایی تُنُک، براساس تعداد کمی از اتم‌های یادگیری‌شده امکان‌پذیر است. به‌منظور دست‌‌یابی به نتایج مطلوب در بهسازی گفتار، از روال یادگیری واژه‌نامه‌ ناهمدوس بهره گرفته می‌شود. به‌‌کمک ضرایب تبدیل موجک، تجزیه سیگنال در زیرباندهای مختلف که شامل اطلاعات دقیقی از محتوای سیگنال هستند، فراهم می‌شود. در روش پیشنهادی، دو سناریوی نظارت‌شده و نیمه‌نظارت‌شده مورد بررسی قرار گرفته و یک الگوریتم آشکارساز فعالیت گفتاری در هر سناریو با توجه به شرط‌های معرفی‌شده بر اساس واژه‌نامه‌های یادگیری‌شده در گام آموزش، پیشنهاد می‌شود. با استفاده از نتایج خروجی آشکارساز پیشنهادی، سیگنال گفتار تخمینی طی یک روال بهسازی در گام بعد به‌دست خواهد آمد. نتایج گزارش‌شده براساس معیارهای مختلف ارزیابی عملکرد، بر توانایی این روش در زمینه کاهش نوفه سیگنال گفتار تاکید می‌کند. روش‌های پیشنهادی، توانایی بالایی را در‌خصوص کاهش نوفه‌های ناایستا به‌خصوص در مقادیر سیگنال به نوفه پایین دارد.
کلیدواژه	بهسازی گفتار، بازنمایی تُنُک، واژه‌نامه ناهمدوس، تبدیل موجک، آشکارساز فعالیت گفتار
آدرس	دانشگاه مازندران, دانشکده فنی مهندسی, ایران
پست الکترونیکی	s.mavaddati@umz.ac.ir

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Authors	Mavaddati Samira
Abstract	Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionarybased algorithm is proposed to train the speech and noise models for each subband of wavelet decomposition level based on the coherence criterion. Using the presented learning method, the selfcoherence measure between different atoms of each dictionary and mutual coherence between the atoms of speech and noise dictionaries are minimized and lower sparse reconstruction error is yielded. In order to reduce the computation time, a composite dictionary is utilized including only the speech dictionary and one of the noise dictionaries selected corresponding to the noise condition in the test environment. The speech enhancement algorithm is introduced in two scenarios, supervised and semisupervised situations. In each scenario, a voice activity detector (VAD) scheme is employed based on the energy of sparse coefficient matrices when the observed data is coded over the related dictionary.The presented VAD algorithms are based on the energy of the coefficient matrices in the sparse representation of the observation data over the specified dictionaries. These speech enhancement schemes are different in the mentioned scenarios. In the proposed supervised scenario, domain adaptation technique is employed to transform a learned noise dictionary into an adapted dictionary according to the noise conditions of the test environment. Using this step, the observed data is sparsely coded with low sparse approximation error based on the current situation of the noisy environment. This technique has a prominent role to obtain better enhancement results particularly when the noise signal has nonstationary characteristics. In the proposed semisupervised scenario, adaptive thresholding of wavelet coefficients is carried out based on the variance of the estimated noise for each frame in different subbands. These implementations are carried out in two different conditions, the training and test steps, as speaker dependent and speaker independent scenarios.Also, different measures are applied to evaluate the performance of the presented enhancement procedures. Moreover, a statistical test is used to have a more precise performance evaluation for different considered methods in the various noisy conditions. The experimental results using different measures show that the presented supervised enhancement scheme leads to much better results in comparison with the baseline enhancement methods, learningbased approaches, and earlier waveletbased algorithms. These results have been obtained for an extensive range of noise types including the structured, unstructured, and periodic noise signals in different SNR values.
Keywords	Speech enhancement ,Dictionary learning ,Sparse representation ,Domain adaptation ,Voice activity detector ,Wavelet transform