Efficient Incorporation of PLSA and LDA Semantic Knowledge in Statistical Language Model Adaptation for Persian ASR Systems

Language models (LMs) are important tools for especially ASR systems to improve their efficiency. Development of robust spoken language model ideally relies on the availability of large amounts of data preferably in the target domain and language. However, more often than not, speech developers need to cope with very little or no data, typically obtained from a different target domain. Language models are very brittle when moving from one domain to another. Language model adaptation is achieved by combining a generic LM with a topic-specific model that is more relevant to the target domain. We review a two major topic-based generative language model techniques designed to gain semantic knowledge of text. We show that applying a tf-idf-related per-word confidence metric, and using unigram rescaling rather than linear combinations with N-grams produces a more robust language model which has a significant higher accuracy on FARSDAT test set than a baseline N-gram model

کلیدواژه ها:

Speech Recognition ، Statistical Language Model Adaptation ، Corpus ،

نویسندگان

Seyed Mahdi Hoseini

Computer Department of Shafagh University Tonekabon

Behrouz Minaei

Computer Department of Iran University of Science & Technology Tehran

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/443555

شناسه ملی سند علمی:

JR_IJOCIT-2-4_003

تاریخ نمایه سازی: 16 فروردین 1395

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Hoseini, Seyed Mahdi and Minaei, Behrouz,1393,Efficient Incorporation of PLSA and LDA Semantic Knowledge in Statistical Language Model Adaptation for Persian ASR Systems,https://civilica.com/doc/443555

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1393, Hoseini, Seyed Mahdi؛ Behrouz Minaei)
برای بار دوم به بعد: (1393, Hoseini؛ Minaei)
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

مقالات مرتبط جدید