An Efficient Set of Parts of Speech in Persian Information Retrieval Systems

Even though the ultimate aim of any information retrieval system is to fulfill its users’ expectations, reducing index storage size and enhancing the system performance are sometimes infinitely preferable, especially for small-sized companies suffering from a lack of hardware resources. For such companies, it is of paramount importance to remove noninfomative terms from their indeces. Selecting a proper set of terms makes it possible to reduce the index storage size and consequently enhance the retrieval performance. In this paper, using parts of speech tagging, we show how to reduce the index storage size without losing precision. Through an experimental process and using Hamshahri corpora, we identify the most effective parts of speech in Persian language. Results demonstrate improvements in the resposnse time and precision of the retrieval.

کلیدواژه ها:

Information Retrieval ، Natural Language Processing ، Part of Speech ، Index storage reduction ، Term selection ، Stop-word detection

نویسندگان

Mohammad Ali Yaghoub Zadeh Fard

Iran University of Science and Technology

Saeed Rahmani

Shiraz University

Omid Kashefi

Iran University of Science and Technology

Behrouz Minaei idgoli

Iran University of Science and Technology

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Intern ational _ «1m Infimatin Techn _ _ _ _ ...
Manning, C., et al., Foundations of Statistical Natural Language Processing. ...
Barr, C., R. Jones, and M. Regelson, The linguistic structure ...
Lewis, D.D. and K.S. Jones, Natural language processing for information ...
Strzalkowski, T., Natural language information retrieval. Information Processing & Management, ...
Strzalkowski, T., Natural Language Information Retrieval. 1999: Kluwer Academic Publishers. ...
Kao, A. and S.R. Poteet, Natural Language Processing and Text ...
Chowdhury, A. and M. McCabe, Improving Information Retrieval Systems using ...
Kanaan, G., R. al-Shalabi, and M. Sawalha, Improving Arabic Information ...
Diner, B. and B. Karaoglan, The Effect of Part-of-Speech Tagging ...
Harksoo Kim, K.k., Jungyun Seo and Gary Geunbae Lee, A ...
Zhai, C., Fast Statistical Parsing of Noun Phrases for Document ...
Klavans, J.L. and M.-Y. Kan, The Role of Verbs in ...
Karimpour, R., et al., Improving Persian information retrieval Systems using ...
Shannon, C.E. and W. Weaver, A Mathematicd Theory of Commun ...
Allan and Raghavan, Using Part-of-speech Patterns to Reduce Query Ambiguity. ...

نمایش کامل مراجع

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

هوش مصنوعی > پردازش زبان طبیعی

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/451039

شناسه ملی سند علمی:

ITCC01_251

تاریخ نمایه سازی: 9 فروردین 1395

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Yaghoub Zadeh Fard, Mohammad Ali and Rahmani, Saeed and Kashefi, Omid and Minaei idgoli, Behrouz,1394,An Efficient Set of Parts of Speech in Persian Information Retrieval Systems,National Conference on Information Technology, Computer & Communication,Torbat Heydarieh,https://civilica.com/doc/451039

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1394, Yaghoub Zadeh Fard, Mohammad Ali؛ Saeed Rahmani and Omid Kashefi and Behrouz Minaei idgoli)
برای بار دوم به بعد: (1394, Yaghoub Zadeh Fard؛ Rahmani and Kashefi and Minaei idgoli)
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

علم سنجی و رتبه بندی مقاله

مشخصات مرکز تولید کننده این مقاله به صورت زیر است:

رتبه علمی دانشگاه علم و صنعت ایران

نوع مرکز: دانشگاه دولتی

تعداد مقالات: 29,807

در بخش علم سنجی پایگاه سیویلیکا می توانید رتبه بندی علمی مراکز دانشگاهی و پژوهشی کشور را بر اساس آمار مقالات نمایه شده مشاهده نمایید.

مقالات پیشنهادی مرتبط

مقالات فوق بر اساس داده کاوی مقالات مطالعه شده توسط پژوهشگران محاسبه شده است.

مقالات مرتبط جدید