微信公众号

官网二维码

中国癌症防治杂志 ›› 2025, Vol. 17 ›› Issue (3): 335-340.doi: 10.3969/j.issn.1674-5671.2025.03.11

• 其他消化系统肿瘤专栏 • 上一篇    下一篇

机器学习整合肿瘤标志物与临床检验数据构建胰腺癌与胰腺良性病变的鉴别模型

  

  • 出版日期:2025-06-25 发布日期:2025-07-10
  • 通讯作者: 周凡 E-mail:nczhoufan@126.com
  • 基金资助:
    国家自然科学基金项目(81960609);江西省卫生健康委员会科技计划项目(20204356)

A discrimination model for differentiating pancreatic cancer from benign pancreatic lesions by integrating tumor biomarkers with clinical test data through machine learning method

  • Online:2025-06-25 Published:2025-07-10

摘要: 目的 采用机器学习方法开发区分胰腺癌与胰腺良性病变的鉴别模型。方法 研究对象为2018年1月至2023年12月期间在南昌大学第二附属医院接受治疗的251例胰腺疾病患者。研究中构建了6种机器学习模型,包括逻辑回归、随机森林、极端梯度提升树(eXtreme gradient boosting,XGBoost)、支持向量机、多层感知器和高斯朴素贝叶斯等,以区分胰腺癌与胰腺良性病变。通过受试者工作特征(receiver operating characteristic,ROC)曲线评估各模型的区分能力,校准曲线评估模型的一致性,决策曲线评估模型的临床适用性,并通过SHapley加性解释(SHapley Additive exPlanations,SHAP)方法对模型进行解释。 在251例胰腺疾病患者中,100例被诊断为胰腺癌,151例被诊断为胰腺良性病变。成功构建6种机器学习模型,其中随机森林、XGBoost、支持向量机、多层感知器模型的ROC曲线下面积(area under the curve,AUC)优于糖类抗原19⁃9(carbohydrate antigen 19⁃9,CA19⁃9)(均P<0.05)。特别是,XGBoost模型的AUC值最高(AUC=0.886),通过决策曲线和校准曲线的分析进一步证实了其显著的临床净收益和较好的一致性。SHAP分析显示,CA19⁃9是XGBoost模型中最重要的贡献者。结论 利用肿瘤标志物和临床检测数据开发的XGBoost模型,显著提高了区分胰腺癌与胰腺良性病变的鉴别能力,显示出其在临床应用的潜力。

关键词: 胰腺肿瘤, 胰腺良性病变, 机器学习, 鉴别模型

Abstract: Objective To develop a discrimination model for differentiating between pancreatic cancer and benign pancreatic lesions using machine learning methods. Methods The study population consisted of 251 patients diagnosed with pancreatic diseases and treated at the Second Affiliated Hospital of Nanchang University between January 2018 and December 2023. Six machine learning models were developed, including logistic regression, random forest, eXtreme gradient boosting (XGBoost), support vector machine, multilayer perceptron, and Gaussian Naive Bayes, to distinguish pancreatic cancer from benign pancreatic lesions. The models' discriminatory capabilities were assessed using the receiver operating characteristic (ROC) curve. Model consistency was evaluated using a calibration curve, clinical applicability was assessed through a decision curve, and model interpretation was facilitated by the  SHapley Additive exPlanations (SHAP) method. Results Out of the  251 patients with pancreatic diseases, 100 were diagnosed with pancreatic cancer, while 151 were diagnosed with benign pancreatic lesions. Six machine learning models were successfully developed, with the area under the ROC curve (AUC) for the random forest, XGBoost, support vector machine, and multilayer perceptron models demonstrated superior performance compared to the carbohydrate antigen 19⁃9 (CA19⁃9) marker (all P<0.05). Notably, the XGBoost model exhibited the highest AUC (AUC=0.886), and analyses using decision and calibration curves further confirmed its substantial clinical net benefit and consistency. SHAP analysis identified CA19⁃9 as the most significant contributor to the XGBoost model. Conclusions XGBoost model developed using tumor markers and clinical test data, significantly enhances the ability to discriminate  between pancreatic cancer from benign pancreatic lesions, indicating potential for clinical application.

Key words: Pancreatic cancer, Benign pancreatic lesions, Machine learning, Discrimination model

中图分类号: 

  • R735.9