文章摘要
周亮;王月;田凌嘉;刘丽芳;张琴;梁黎昕.不同机器算法在乳腺癌发病风险预测模型中使用效果的评估[J].中医药信息,2023,40(8):23-28
不同机器算法在乳腺癌发病风险预测模型中使用效果的评估
Effect Evaluation of Different Machine Algorithms in Breast Cancer Risk Prediction Model
投稿时间:2022-11-12  录用日期:2023-01-11
DOI:10.19656/j.cnki.1002-2406.20230804
中文关键词: 机器算法  逻辑回归  随机森林  DT  GBDT  XGboost  SVC  乳腺癌  风险预测
英文关键词: Machine algorithm  Logistic regression  Random forest  DT  GBDT  XGboost  SVC  Breast cancer  Risk prediction
基金项目:1] 湖南省卫生健康委科研项目(202112070480)基于真实事件大数据构建湖南地区乳腺癌风险评估模型的设计与研究[2]湖南省中医药科研计划项目(C2022034)基于大数据及人工智能技术的名中医经验传承研究--乳腺癌巩固期中医治疗的临床观察及作用机制研究[3]湖南省临床医疗技术创新引导项目(2021SK51417)基于TGF-β1诱导乳腺癌细胞发生EMT探讨护场-传舍理论指导的益气解毒法治疗乳腺癌的作用机制研究 周亮,女,医学博士,主任医师,研究方向:乳腺疾病防治。△通讯
作者单位
周亮;王月;田凌嘉;刘丽芳;张琴;梁黎昕  
摘要点击次数: 101
全文下载次数: 91
中文摘要:
      目的:使用不同机器学习算法开发乳腺癌发病风险预测模型。方法:采用湖南中医药大学第一附属医院乳腺科患者数据库作为数据来源;根据乳腺癌相关风险因素,选取数据库中的初潮时间、流产次数、生育情况、月经及母乳喂养情况、乳腺癌家族史、作息时间、饮食习惯及中医证候特征等作为建模候选变量,提取其人口学特征、生命体征、病理检查等数据;使用6种机器算法开发模型,并对不同机器学习算法在预测模型中使用的效果进行评估。结果:综合两种计算方法对两种建模算法的特征重要度预测结果,可以得出年龄、燥热、流产次数、是否曾患乳腺炎、生活中是否经常锻炼、第一次月经时间等变量对乳腺癌的预测可能有重要作用。随机森林算法的预测结果最好,准确率为0.86,AUC值达0.89,XGboost和GBDT算法的准确率都为0.85,AUC值也同为0.85,其次是逻辑回归算法的准确率和AUC值都为0.84,SVC和DT算法的预测准确率分别为0.83、0.79,AUC值分别为0.82、0.71。从各算法的建模结果可以看出,随机森林算法由于集成学习的特性,本身的精度比一般单个算法的要好,预测结果的准确性也高。结论:基于森林算法的乳腺癌患者发病风险预测模型对辅助临床医生指导患者乳腺癌发病风险预防有重要意义。
英文摘要:
      Objective: To develop a prediction model for breast cancer risk by using different machine learning algorithms. Methods: The database of the patients with breast cancer in the First Affiliated Hospital of Hunan University of Chinese Medicine was used as the data source. According to the risk factors of breast cancer, the menarche time, abortion frequency, fertility condition, menstrual and breastfeeding, family history of breast cancer, rest time, eating habits and TCM syndrome feature were selected as modeling variables, and demographic characteristics, vital signs and disease examination were extracted. Six machine learning algorithms were used to develop the model, and the effects of different machine learning algorithms were evaluated in the prediction model. Results: It was found that the variables of age, dryness-heat, abortion times, mammitis, exercises and first period time were of significance to predict breast cancer by synthesizing two kinds of algorithm and predicting the importance of the two algorithm models. Algorithm of random forest had the best prediction, with the accuracy rate of 0. 86, the AUC value of 0. 89; the accuracy rate of XGboost and GBDT were both 0. 85, and the AUC value was 0. 85 as well; the accuracy rate of logistic regression and the AUC value were both 0. 84; the accuracy rates of SVC and DT were 0. 83 and 0. 79, respectively; the AUC values were 0. 82 and 0. 71, respectively. The results of each algorithm showed that the accuracy and prediction result of random forest algorithm were better than those of any other single algorithm due to its feature in ensemble learning. Conclusion: The prediction model of breast cancer risk based on random forest algorithm is important for assisting clinicians to know the risk of breast cancer.
查看全文   查看/发表评论  下载PDF阅读器
关闭
微信公众号
分享按钮