بهسازی گفتار تک‌کاناله با استفاده از ترکیب مدل قطعی نمایی و مدل تصادفی t location-scale

Fa | Ar | En

بهسازی گفتار تک‌کاناله با استفاده از ترکیب مدل قطعی نمایی و مدل تصادفی t location-scale


نویسنده	امینی زهرا ,فرجی ندا
منبع	هوش محاسباتي در مهندسي برق - 1399 - دوره : 11 - شماره : 1 - صفحه:63 -80
چکیده	بیشتر روش‌های بهسازی گفتار، تخمینگری کاملاً متکی به مدل تصادفی گفتار ارائه می‌دهند. در این مقاله، یک تخمینگر کمترین میانگین مربعات خطا تحت یک مدل قطعی تصادفی پیشنهاد می‌شود که در آن از یک توزیع دنباله سنگین به نام(tls) t location-scale برای مدل‌کردن ضرایب تبدیل فوریه گسسته گفتار تمیز و از مدل نمایی و سینوسی به‌عنوان مدل قطعی استفاده شده است. در مدل نمایی به‌کاررفته، تخمین فرکانس و ضریب میرایی به روش ماتریس پِنسِل انجام می‌شود. همچنین، در پژوهش‌های قبلی تعداد مولفه‌های نمایی در ساخت مدل قطعی برای بهسازی گفتار، یک در نظر گرفته شده است که در این مقاله، مدل نمایی به تعداد دلخواه مولفه‌های نمایی بسط داده می‌شود‌. پیاده‌سازی‌ها در سه حالت ترکیبی نمایی گاوسی (روش پیشنهادی نخست)، نمایی tls (روش پیشنهادی دوم)‌ و سینوسی گاوسی انجام شده‌اند و با روش موجود نمایی – گاوسی (تنها با یک مولفه نمایی) و تخمینگرهای تصادفی وینر و مبتنی بر tls مقایسه می‌شوند. نتایج پیاده‌سازی در حضور شش نویز از مجموعه داده نویز noisex-92 نشان می‌دهند که دو روش پیشنهادی در قیاس با روش‌های مبتنی بر مدل تصادفی صرف، به بهبود معیار نسبت سیگنال به نویز قطعه‌ای منجر شده‌اند و در ارزیابی ادراکی کیفیت گفتار عملکرد نسبتاً برابری دارند.
کلیدواژه	بهسازی گفتار، تابع چگالی احتمال t location-scale، فیلتر وینر، کمترین میانگین مربعات خطا، مدل قطعی نمایی، مدل سینوسی
آدرس	دانشگاه بین‌المللی امام خمینی, گروه مهندسی برق, ایران, دانشگاه بین‌المللی امام خمینی (ره), گروه مهندسی برق, ایران
پست الکترونیکی	nfaraji@eng.ikiu.ac.ir

Singlechannel Speech Enhancement using the Combination of Exponential Deterministic Model and t Locationscale Stochastic Model

Authors	amini zahra ,Faraji Neda
Abstract	Most speech enhancement algorithms focus on obtaining an estimator relying on stochastic models. In this paper, a minimum meansquare error (MMSE) estimator under a stochastic–deterministic model is proposed where a heavytail distribution called tLocationScale (tls) is used for modeling Discrete Fourier Transform coefficients of clean speech signals and exponential and sinusoidal models are employed as deterministic models. In the exponential model, the frequency and damping coefficient are estimated by using the Matrix Pencil method. Also, in previous studies, the number of exponential components in the deterministic model for stochasticdeterministic speech enhancement algorithm has been considered to be one. In this paper, the corresponding exponential model is developed to have an arbitrary number of exponential components. The speech enhancement experiments are performed in three modes, exponentialGaussian (the first proposed method), exponentialtls (the second proposed method), and sinusoidalGaussian. Comparisons are made with the exponentialGaussian method (with only one exponential component), as well as with the Weiner and tls stochastic estimators. The implementation results in the presence of six noise types from Noisex92 dataset show that the two proposed methods improve the segSNR values and have quite similar PESQ values comparing with the stochastic based speech enhancement methods.
Keywords