Predicting Opioid Use Disorder Using Machine Learning

Abstract

Opioid Use Disorder (OUD), defined as physical or psychological reliance on opioids, is quickly becoming a public health epidemic. This project demonstrates the potential of supervised machine learning in predicting adults at risk for OUD by considering interactions between various demographic, socioeconomic, physical, and psychological features in an integrated manner. A labeled data set was built from the responses to the 2016 edition of the National Survey on Drug Use and Health (NSDUH) conducted by the Substance Abuse and Mental Health Services Administration (SAMHSA). This labeled data set was used to train a random forest classifier while accounting for class imbalance. Random forest was chosen as the classification technique for two reasons. First, it is robust to correlated features. Second, it can also identify the relative importance of the different features in predicting OUD.

The random forest classifier can identify adults at risk for OUD accurately with the average area under the ROC curve (AUC) over 0.85. Early initiation of marijuana (prior to 18 years of age) emerges as the most dominant predictor for developing OUD in adult life. This is surprising because it ranks higher than both mental illness and disability, which are often comorbid with substance use disorders. The key takeaway is that curbing early initiation of marijuana is the best prevention strategy. This highlights the crucial role that educators, counselors, and parents can play in alleviating the United States’ opioid overdose crisis.

Read the Report