Statistical and Machine Learning Models for Diabetes Diagnosis

Authors

  • Sathvik Kommireddy Rancho Cucamonga High School, Rancho Cucamonga, CA Author

Keywords:

diabetes, diagnosis, binary logistic regression, binary probit regression, binary complementary log-log regression, random forest, gradient boosting, support vector machine, k-nearest neighbor classifier, naive Bayes classifier, artificial neural network

Abstract

This study evaluates the use of logistic regression and supervised machine learning models to predict diabetes diagnosis based on demographic factors, medical history, and blood culture results. Logistic regression is employed for feature selection, helping to identify key risk indicators. The performance of several machine learning binary classification algorithms is compared using multiple goodness-of-fit metrics. Results highlight the added value of laboratory data and ensemble methods in improving diagnostic performance.

Downloads

Published

2025-07-31