微信公众号

官网二维码

中国癌症防治杂志 ›› 2022, Vol. 14 ›› Issue (1): 70-75.doi: 10.3969/j.issn.1674-5671.2022.01.12

• 临床研究 • 上一篇    下一篇

基于机器学习预测乳腺癌患者新辅助化疗的病理完全反应

  

  1. 青岛市市立医院心胸外科 
  • 出版日期:2022-02-25 发布日期:2022-03-11
  • 通讯作者: 金凤 E-mail:408705521@qq.com
  • 基金资助:
    山东省医药卫生科技发展计划项目(2017DX0212)

Predicting the pathological complete response of breast cancer patients to neoadjuvant chemotherapy based on machine learning

  • Online:2022-02-25 Published:2022-03-11

摘要: 目的 基于乳腺癌电子病历系统收集的临床和病理特征数据构建机器学习模型,预测新辅助化疗(neoadjuvant chemotherapy,NAC)后的病理完全反应(pathological complete response,pCR)。方法 回顾性收集2015年1月至2020年12月在本院接受NAC治疗和手术切除的乳腺癌患者的临床信息。按7∶3的比例将患者随机分为训练集和验证集。在训练集中分别构建5个机器学习模型:Logistic回归(LR)、人工神经网络(artificial neural network,ANN)、简单贝叶斯(naive bayes,NB)、随机森林(random forest,RF)以及XGboost模型。采用受试者工作特征(receiver operating characteristic,ROC)曲线下面积(AUC)、准确性、敏感度和特异度评价机器学习的预测能力。结果 共742例患者纳入分析,其中训练集533例,验证集209例。经特征工程后,选择年龄、CA-15-3、ER状态、PR状态、HER2状态、Ki-67、 T分期、N分期和NAC方案等特征构建预测模型。构建的5个机器学习模型中,XGboost模型的性能最高,在训练集和验证集中的AUC分别为0.850、0.834。结论 使用治疗前临床和病理特征并基于机器学习构建的XGboost模型在预测乳腺癌患者NAC后的pCR反应中具有良好效能,能为患者后续的治疗策略制定提供依据。

关键词: 乳腺癌, 新辅助化疗, 病理完全反应, 机器学习, XGboost

Abstract:  Objective To develop a machine learning model based on the clinical and pathological characteristics data in the breast cancer electronic medical record system to predict the pathological complete response (pCR) after neoadjuvant chemotherapy (NAC). Methods The clinical information on the breast cancer patients who received NAC treatment and curative surgery in Qingdao Municipal Hospital from January 2015 to December 2020 were retrospectively collected. The patients were randomly divided into training set and validation set in a ratio of 7:3. Five machine learning models were built in the training set, including Logistic regression (LR), artificial neural network (ANN), naive bayes (NB), random forest (RF) and XGboost models. The area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity and specificity were used to evaluate the predictive ability of machine learning. Results A total of 742 patients were included in the analysis, 533 in the training set and 209 in the validation set. After feature engineering, the properties such as age, CA-15-3, ER status, PR status, HER2 status, Ki-67, T stage, N stage, and NAC plan were selected to construct a prediction model. Among the five machine learning models, the XGboost model had the highest performance, with AUC of 0.850 and 0.834 in the training set and the validation set, respectively. Conclusions The XGboost model constructed based on the machine learning, pre-treatment clinical and pathological features has good efficacy in predicting the pCR response of breast cancer patients after NAC, providing a basis for the formulation of subsequent treatment strategies for patients.

Key words: Breast cancer, Neoadjuvant chemotherapy, Complete pathological response, Machine learning, XGboost

中图分类号: 

  • R737.9