Machine learning shows promise in identifying patients with nmCRPC

Advanced machine learning and natural language processing approaches were combined to identify patients with non-metastatic castration-resistant prostate cancer from electronic health record data.

By combining machine learning and rule-based natural language processing (NLP), researchers developed an algorithm to leverage electronic health records (EHRs) to identify patients with non-metastatic castration-resistant prostate cancer (nmCRPC).1

Using Department of Veterans Affairs EHR data from across the country, the researchers identified 13,199 patients in their final nmCRPC cohort of 654,148 prostate cancer patients from 2006 to 2020. Of the total prostate cancer patients identified by their algorithm, 26,506 patients were castration-resistant, but in the nmCRPC cohort, 8,297 patients were excluded due to evidence of metastatic disease.

The accuracy of this machine learning algorithm was 86% with the NLP classifying patients with metastatic disease, showing 96% accuracy, 99% accuracy, and 98% sensitivity. Furthermore, there was an accuracy of 86% within 3 months of the patient’s diagnosis in predicting whether they will progress to nmCRPC.

“It is important to be able to identify complex disease states from increasingly accessible EHR data,” researchers from the University of Utah Huntsman Cancer Institute wrote in a poster of their study. β€œWe combine advanced machine learning and NLP approaches to identify [patients with] nmCRPC from EHR data including a variety of elements from multiple sources.”

The researchers used an extreme gradient boosting machine learning approach that was previously trained on a similar cohort of prostate cancer patients identified within Veterans Affairs cancer registries. International Classification of Diseases (ICD) codes -9 and -10 were divided into 7-day intervals with the ICD code numbers within each interval assigned as a set of predictive characteristics for patients who progressed.

This also allowed the researchers to exclude patients without prostate cancer who might have been in the EHR they reviewed. Training patients were incorporated into the algorithm to teach it how to categorize patients. This started with if patients were experiencing urinary symptoms, and if yes, identifying whether the patient had ICD for bladder cancer or urinary tract infection, and if yes again, those patients were designated as having no prostate cancer. . Patients with ICD codes for prostate cancer were given a value of +2 which allowed for proper weighting of the model to go on to predict patient progression.

To further classify patients, those with evidence of prior surgical castration, current androgen deprivation therapy (ADT), or a testosterone level consistent with medical castration, those with 50 ng/dL or greater (≀ 2.0 nmol/ l), were considered castrated. These patients were then removed from the cohort. In addition, patients with nmCRPC were defined as those who had a diagnosis of castration-resistant prostate cancer defined by whether the patient had 2 consecutive increases in PSA while castrated, or no evidence of metastatic disease on radiological report.

To identify patients with metastatic disease, patient data was fed through the NLP to find undenied mentions of metastatic disease in radiology reports. The algorithm then used a unified medical language system to identify metastatic vocabulary and identify patterns of metastatic disease, but it still required human review. Once this was done, these patients were given a score to trigger identification within the larger algorithm that screens thousands of prostate cancer patients.

According to the researchers, if a patient shows no signs of metastatic disease but has progression of their disease, despite having castrate levels of testosterone signals, they transition to nmCRPC. This typically occurs after a patient initially responds to ADT but becomes resistant to therapies that inhibit androgen binding to the androgen receptor, blocking the potential of the treatment. Identifying these patients is important in order to then adjust treatment and monitor the progression of their disease.

“This approach classifies cancer diagnosis and date of diagnosis with reasonable accuracy,” the researchers concluded.

Reference
Patil V, Rasmussen K, Morreall D, et al. RWD140 Using machine learning to identify patients with nonmetastatic castration-resistant prostate cancer (NMCRPC) from electronic medical record data. Health Value. 2022;25(suppl. 7): S603. doi.org/10.1016/j.jval.2022.04.1663

Leave a Comment