PersianWord Sense Disambiguation Corpus Extraction Based onWeb CrawlerMethod

Mohamadreza Mahmoodvand; Maryam Hourali

PersianWord Sense Disambiguation Corpus Extraction Based onWeb CrawlerMethod

محل انتشار: مجله بین المللی پیشرفت در علوم کامپیوتر، دوره: 4، شماره: 5

سال انتشار: 1394

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 525

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/405252

شناسه ملی سند علمی:

JR_ACSIJ-4-5_015

تاریخ نمایه سازی: 7 آذر 1394

چکیده مقاله:

Finding an appropriate dataset for natural language processing applications is one of the main challenges for researches of this field. This issue is more problematic in Non-Latin languagesespecially Persian language. Access to an appropriate dataset that can be used in development of practical programs in languageprocessing field, helps us to validate the obtained results and provide the feasibility for comparison and precise analysis of theresearch studies in this field. This paper presents the procedure for extracting a standard dataset in Persian language. This datasetcan only be used for research studies in the field of word-sensedisambiguation in Persian language. The required documents that include the ambiguous words of interest are collected by acrawling robot; then these words are processed and registered in Persian dataset for ambiguous words. In this research, threeprevalent Persian ambiguous word are used for extracting appropriate phrases that included these words. Finally, aframework for creating the proper configuration for applicationin word-sense disambiguation problems is presented. By using of this method, we have a solution for absence of suitable word sense disambiguation corpus in Persian language

کلیدواژه ها:

Natural language processing ، Word sense disambiguation ، Information Extraction ، Corpus ، Machine learning

نویسندگان

Mohamadreza Mahmoodvand

Artificial Intelligence MSc. Student ICT Department, Malek-e-Ashtar University of Technology Tehran, Iran

Maryam Hourali

Assistant Professor ICT Department, Malek-e-Ashtar University of Technology Tehran, Iran