NMF-based Improvement of DNN and LSTM Pre-Training for Speech Enhancemet
سال انتشار: 1402
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 62
فایل این مقاله در 13 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_ITRC-15-3_006
تاریخ نمایه سازی: 19 آبان 1402
چکیده مقاله:
A novel pre-training method is proposed to improve deep-neural-networks (DNN) and long-short-term-memory (LSTM) performance, and reduce the local minimum problem for speech enhancement. We propose initializing the last layer weights of DNN and LSTM by Non-Negative-Matrix-Factorization (NMF) basis transposed values instead of random weights. Due to its ability to extract speech features even in presence of non-stationary noises, NMF is faster and more successful than previous pre-training methods for network convergence. Using NMF basis matrix in the first layer along with another pre-training method is also proposed. To achieve better results, we further propose training individual models for each noise type based on a noise classification strategy. The evaluation of the proposed method on TIMIT data shows that it outperforms the baselines significantly in terms of perceptual-evaluation-of-speech-quality (PESQ) and other objective measures. Our method outperforms the baselines in terms of PESQ up to ۰.۱۷, with an improvement percentage of ۳.۴%.
کلیدواژه ها:
pre-training ، deep neural networks (DNN) ، long short-term memory (LSTM) ، non-negative matrix factorization (NMF) ، speech enhancement ، basis matrix ، noise classification
نویسندگان
Razieh Safari Dehnavi
Department of Electrical Engineering Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran
Sanaz Seyedin
Department of Electrical Engineering Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran