Improving classification accuracy of imbalanced data by Forest Algorithm

سال انتشار: 1399
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 520

فایل این مقاله در 13 صفحه با فرمت PDF و WORD قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

TETSCONF02_029

تاریخ نمایه سازی: 27 خرداد 1399

چکیده مقاله:

Imbalanced data denotes data in which the number of data pieces related to two classes are not equal and one class has fewer samples than the other class. Regretfully, the majority of available databases in the real world, used for training systems such as filtering adult pages, diagnosing diseases and detecting intrusion, include unbalanced data. The presence of such data leads to the reduction of the training quality of the monitoring methods. Forest algorithm is considered as an optimization method which has been recently proposed by the researchers. It should be noted that this algorithm has not been used yet for balancing data. In this paper, forest algorithm is used for balancing data. The proposed method was investigated and its efficiency was tested through four different classifiers, i.e. Naive Bayes, artificial neural networks, decision tree and the nearest adjacent neighbor. Also, the proposed method was compared with other data balancing methods, including RS, SRAND, BRC and BRC+RS. According to the obtained results, the average detection rate of the proposed method was 5.7% higher than the imbalanced mode, 3.1% higher than RS method, 3.6% higher than SRAND method, 5.3% higher than BRC method and 2.4% higher than BRC+RS method. The highest detection result was 98% which was achieved by the Naive Bayes classifier.

کلیدواژه ها:

نویسندگان

Zahra Vahedinia,

Department of Computer Engineering, University of Tabriz, Tabriz, Iran

Mohammad-Reza Feizi-Derakhshi

Department of Computer Engineering, University of Tabriz, Tabriz, Iran