Comparative study of machine learning algorithms for diabetes detection using binary data
DOI:
https://doi.org/10.17721/1812-5409.2024/1.23Keywords:
random forest, machine learning, classification accuracy, diabetes predictionAbstract
The prevalence of diabetes is constantly increasing, and timely diagnosis of this disease is extremely important in the health care system. Nowadays, in addition to advanced medical technologies, there is an opportunity to process large volumes of information quickly and efficiently, in particular medical information. Patients with diabetes, even in the early stages, have certain similar symptoms and patterns, which makes it possible to effectively diagnose the disease, based on easily obtained clinical data. In this work, we compare the accuracy of different machine learning algorithms for diagnosing diabetes on binary data that do not include laboratory tests, and can be obtained by patient questionnaires. All the classification models give good results and show the feasibility of identifying individuals probably having undiagnosed diabetes, based on questionnaire data. The best results are provided by random forest classifier. In order to improve the accuracy of the classification, an analysis of the relationships between the variables of the dataset was carried out. This made it possible to reduce the dimensionality of the model by removing variables that do not carry useful information. Moreover, the data set is additionally balanced. The resulting models demonstrate higher efficiency than the classification models built on the base dataset.
Pages of the article in the issue: 119 - 127
Language of the article: English
References
Aguilera-Venegas G., López-Molina A., Rojo-Martínez G., Galán-García J.L. (2023). Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus. Journal of Computational and Applied Mathematics, Vol. 427, 115115.
Battineni, G., Sagaro, G., Nalini, C., Amenta, F., Tayebati, S. K. (2019). Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions by Cross-Validation Methods, Machines, 7, 74.
Frank, E., Hall, M.A., Witten, I.H. (2016). The WEKA Workbench Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Fourth Edition, doi:10.1016/B978-0-12-804291-5.00024-6.
Fregoso‐Aparicio, L., Noguez, J., Montesinos, L., García‐García, J.A. (2021). Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol. Metab. Syndr., 13, 148. https://doi.org/10.1186/s13098‐021‐00767‐9.
Hectors, T.L., Vanparys, C., van der Ven, K., Martens, G.A., Jorens, P.G., Van Gaal, L.F., Covaci, A., De Coen, W., Blust, R. (2011). Environmental pollutants and type 2 diabetes: a review of mechanisms that can disrupt beta cell function. Diabetologia, 54(6), 1273-90.
Islam M.M.F., Ferdousi R., Rahman S., Bushra H.Y. (2020) Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. In: Gupta M., Konar D., Bhattacharyya S., Biswas S. (eds) Computer Vision and Machine Intelligence in Medical Image Analysis. Advances in Intelligent Systems and Computing, vol 992. Springer, Singapore. https://doi.org/10.1007/978-981-13-8798-2_12
Kangra, K., Singh, J. (2023) Comparative analysis of predictive machine learning algorithms for diabetes mellitus. Bulletin of Electrical Engineering and Informatics, 12(3), 1728–1737.
Sadhu, A., Jadli, A. (2021) Early-Stage Diabetes Risk Prediction: A Comparative Analysis of Classification Algorithms. International Advanced Research Journal in Science, Engineering and Technology, 8(2), 193-201.
Wei, S., Zhao, X., Miao, C. (2018). A comprehensive exploration to the machine learning techniques for diabetes identification. In IEEE World Forum on Internet of Things, WF-IoT 2018 – Proceedings.
Kaggle, Prediction of diabetes at early stage: https://www.kaggle.com/datasets/andrewmvd/early-diabetes-classification
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Hanna Livinska, Daria Skrypnyk, José Luis Galán-García
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).