Clinical and Genomic Biomarkers for Progression-Free Survival in Non-Small Cell Lung Cancer: A Machine Learning Approach

Authors

  • Owen Zhuang Sun California Academy of Mathematics and Science Author

Keywords:

non-small cell lung cancer, progression-free survival, survival analysis with censoring, Cox proportional hazards model, random survival forests, gradient boosting survival analysis

Abstract

Lung cancer is the leading cause of cancer mortality in the United States, with non-small cell lung cancer (NSCLC) accounting for approximately 85% of cases. This study aims to identify clinical and genomic risk factors associated with progression-free survival (PFS) in advanced NSCLC patients. A cohort of 218 U.S. patients from the MSK MIND dataset was analyzed using three survival analysis models implemented in Python. The analysis revealed that EGFR and STK11 driver mutations and elevated derived neutrophil-to-lymphocyte ratio (dNLR) levels were associated with increased hazard. Albumin levels were associated with a significant decrease in hazard. PD-L1 expression and tumor mutational burden (TMB) showed relatively modest protective effects. The Gradient Boosted Machine (GBM), a machine learning model for survival analysis, demonstrated the highest predictive capability with a C-index of 0.701, having better-than-random performance in the testing dataset. These findings highlight the critical role of specific clinical and genomic biomarkers in affecting NSCLC survival and improving the accuracy of survival predictions.

Downloads

Published

2024-11-04