Machine Learning Algorithms Capable of Type ۲ Diabetes MellitusEarly Diagnosis using Explored Important Features

سال انتشار: 1401
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 61

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS11_066

تاریخ نمایه سازی: 19 آذر 1402

چکیده مقاله:

Diabetes Mellitus is a choric metabolic disease that according to World Health Organization (WHO), is the cause of death of more than ۱.۶ million people. In this study, using the Random Forest algorithm, the six most important features of twenty features of the public dataset of type ۲ diabetes patients was determined and by using extracted features six machine learning algorithms, namely Logistic Regression (LR), Support Vector Machine (SVM), K Nearest Neighbors (KNN), Decision Tree (DT), Extremely Randomized Trees (ERT), and XGBoost were developed. Their performance in diagnosing diabetes was trained and tested using ۴-fold cross-validation and hold-out approaches (with ۲۵% of the data excluded from the training process for testing). Accuracy of the LR, SVM, KNN, DT, ERT, and XGBoost algorithms were ۹۲.۳۱%, ۹۰.۷۷%, ۹۶.۱۵%, ۹۵.۳۸%, ۹۵.۹۲%, and ۹۶.۹۲%, with XGBoost outperforming the rest of the algorithms. Considering the F۱-Score metric, LR, SVM, KNN, DT, ERT, and XGBoost algorithms achieved ۹۳.۷۲%, ۹۲.۲۱%, ۹۶.۷۷%, ۹۶.۱۵%, ۹۶.۴۴%, and ۹۷.۴۴% results, confirming the performance of the XGBoost algorithm based on the accuracy metric. Also, in addition to results acquired with a ۴-fold cross-validation approach, the XGBoost algorithm o↵ers better performance regarding the accuracy and F۱-Score metrics. Through hold-out cross-validation approach, accuracy of the LR, SVM, KNN, DT, ERT, and XGBoost algorithms were ۹۲.۳۱%, ۹۳.۰۸%,۹۵.۳۸%, ۹۵.۳۸%, ۹۴.۶۲%, and ۹۶.۱۵% and F۱-Score of the LR, SVM, KNN, DT, ERT, and XGBoost algorithms were ۹۳.۹۰%, ۹۴.۳۴%, ۹۶.۳۰%, ۹۶.۲۰%, ۹۵.۶۵%, and ۹۶.۸۶%, respectively. XGBoost algorithm was capable of diagnosing type ۲ diabetes outperforming other algorithms evaluated in this study using the most important features (age, gender, polyuria, polydipsia, sudden weight loss, and partial paresis) validated using ۴-fold and hold-out cross-validation methods. This algorithm can act as a supplementary tool for the faster and early diagnosis of type ۲ diabetes

نویسندگان

Samin Babaei rikan

Urmia university

Ali Ghafari

Tehran university of medical sciences

Reza Ghafari

Urmia university of medical sciences

Amir Sorayaie azar

Urmia university.