NMF-based Improvement of DNN and LSTM Pre-Training for Speech Enhancemet

سال انتشار: 1402
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 62

فایل این مقاله در 13 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_ITRC-15-3_006

تاریخ نمایه سازی: 19 آبان 1402

چکیده مقاله:

A novel pre-training method is proposed to improve deep-neural-networks (DNN) and long-short-term-memory (LSTM) performance, and reduce the local minimum problem for speech enhancement. We propose initializing the last layer weights of DNN and LSTM by Non-Negative-Matrix-Factorization (NMF) basis transposed values instead of random weights. Due to its ability to extract speech features even in presence of non-stationary noises, NMF is faster and more successful than previous pre-training methods for network convergence. Using NMF basis matrix in the first layer along with another pre-training method is also proposed. To achieve better results, we further propose training individual models for each noise type based on a noise classification strategy. The evaluation of the proposed method on TIMIT data shows that it outperforms the baselines significantly in terms of perceptual-evaluation-of-speech-quality (PESQ) and other objective measures. Our method outperforms the baselines in terms of PESQ up to ۰.۱۷, with an improvement percentage of ۳.۴%.

کلیدواژه ها:

pre-training ، deep neural networks (DNN) ، long short-term memory (LSTM) ، non-negative matrix factorization (NMF) ، speech enhancement ، basis matrix ، noise classification

نویسندگان

Razieh Safari Dehnavi

Department of Electrical Engineering Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran

Sanaz Seyedin

Department of Electrical Engineering Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran