Persian News Classification Using Bag-of-Concepts

سال انتشار: 1397
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 587

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IDS03_074

تاریخ نمایه سازی: 31 اردیبهشت 1398

چکیده مقاله:

Text classification is the task of automatically assigning a document set to a predefined set of classes or topics.The representation of a document has a strong impact on the performance of classification algorithms. A common document representation is Bag-Of-Words (BOW), which represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality and as the number of unique words increases, the classifier fails to preserve an acceptable accuracy. In this paper the method of Bag-Of-Concepts (BOC) is employed which overcomes the weakness of (BOW). The purpose of this method is to group semantically similar words in the hope of decreasing high dimensionality that occurs in the BOW method. The superiority of this method compared to other approaches is that this method incorporates the impact of semantically similar words on preserving document proximity effectively.

کلیدواژه ها:

نویسندگان

Asma Faraji Dizaji

school of mathematics, statistics and computer science, college of science, university of tehran

Hedieh Sajedi

school of mathematics, statistics and computer science, college of science, university of tehran

Arian Hedayati Far

school of mathematics, statistics and computer science, college of science, university of tehran