CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Two-stage text feature selection method using fuzzy entropy measure and ant colony optimization

عنوان مقاله: Two-stage text feature selection method using fuzzy entropy measure and ant colony optimization
شناسه ملی مقاله: ICEE20_360
منتشر شده در بیستمین کنفرانس مهندسی برق ایران در سال 1391
مشخصات نویسندگان مقاله:

Fardin Ahmadizar - University of Kurdistan
Majid Hemmati
Ahmad Rabanimotlagh

خلاصه مقاله:
Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization hasbeen emerged as an appropriate tool to classify documents into predefined categories. High dimensionality of the feature space isa common problem in text categorization. Most of the features affecting the classifier performance are irrelevant and redundant. Hence, feature selection is used to reduce featurespace thus increasing classifier performance. In this paper, a twostage method is proposed for text feature selection. At the firststage a filtering technique using the fuzzy entropy measure is applied and features are ranked based on their values. Then,features with the values higher than a threshold are removed from feature set. In the second stage, an ant colony optimizationapproach is employed to select features from the reduced feature space in the first stage. The proposed method is evaluated through the use of the k-nearest neighbor classifier on top 10 Retures-21578 categories. The experimental results obtained, show the efficiency of the proposed method.

کلمات کلیدی:
component; Text categorization; Feature selection; Ant colony optimization; Fuzzy measures

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/154572/