[1901.03896] Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods

Log Date
TechniqueAdaBoostRandom forest classifierlogistic regression
Dataset Raw
In this work, we investigate the importance of ethnicity in colorectal cancer survivability prediction using machine learning techniques and the SEER cancer incidence database. We compare model performances for 2-year survivability prediction and feature importance rankings between Hispanic, White, and mixed patient populations. Our models consistently perform better on single-ethnicity populations and provide different feature importance rankings when trained in different populations. Additionally, we show our models achieve higher Area Under Curve (AUC) score than the best reported in the literature. We also apply imbalanced classification techniques to improve classification performance when the number of patients who have survived from colorectal cancer is much larger than who have not. These results provide evidence in favor for increased consideration of patient ethnicity in cancer survivability prediction, and for more personalized medicine in general.