CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Improving Persian POS Tagging Using the Maximum Entropy Model

عنوان مقاله: Improving Persian POS Tagging Using the Maximum Entropy Model
شناسه ملی مقاله: ICS12_231
منتشر شده در دوازدهمین کنفرانس ملی سیستم های هوشمند ایران در سال 1392
مشخصات نویسندگان مقاله:

Ahmad A. Kardan - Department of Computer Engineering and Information Technology Amirkabir University of Technology Tehran, Iran
Maryam Bahojb Imani - Department of Computer Engineering and Information Technology Amirkabir University of Technology Tehran, Iran

خلاصه مقاله:
Part of Speech (POS) tagging is one of the fundamental steps in various speech and text processing applications. POS tagging is the process of assigning the words ininput sentences with their categories according to their contextual and grammatical properties. In addition to the generalPOS tagging difficulties such as the disambiguation of multicategorywords and unknown words, the Persian language,unlike the English language, is a free order language and it has its own characteristics. These challenges can greatly affect the quality of the part-of-speech tagging process. An efficient POStagging process has been developed for some languages, especially for the English language, but just a few researches have been done on the Persian language. To address these issues and achieve high POS tagging accuracy, we chose features which can show the important characteristics of words in a sentence, aswell as maximum entropy as a machine learning classifier. Experimental results show that the proposed Persian POStagging system outperforms the other state-of-the-art Persian taggers.

کلمات کلیدی:
Natural Language Processing; Part of Speech Tagging;Persian Part of Speech Tagging; Maximum Entropy

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/276310/