文章摘要
基于预训练模型及条件随机场的中医医案命名实体识别
Named Entity Recognition of TCM Case Records Based on Pre-trained Model and Conditional Random Field
投稿时间:2022-12-27  录用日期:2023-01-11
DOI:
中文关键词: 命名实体识别  预训练模型  条件随机场  中医医案
英文关键词: Named Entity Recognition(NER)  Pre-trained Model  Conditional Random Field(CRF)  TCM Case Records
基金项目:
作者单位邮编
吴佳泽 北京中医药大学 102488
李坤宁 北京中医药大学 102488
陈明 北京中医药大学 102488
摘要点击次数: 49
全文下载次数: 0
中文摘要:
      目的 中医医案的命名实体识别是对其进行高价值数据挖掘的前提,为了解决目前中医医案命名实体识别效率一般的问题,本文提出了一种基于预训练模型及条件随机场(CRF)的神经网络。方法 人工标注所选中医医案的10类命名实体作为训练集和验证集,并构建基于BERT、RoBERTa、ALBERT及CRF的神经网络,以探究对于中医医案命名实体识别任务的最佳预训练模型及CRF对其贡献大小。结果 基于RoBERTa-CRF构建的神经网络在中医医案命名实体识别任务中的性能最优,其对命名实体识别的整体准确率为99.33%,精确率为98.24%,召回率为98.51%,F1分数为98.38%。结论 基于RoBERTa-CRF构建的神经网络能有效实现中医医案命名实体识别,解决其效率一般的问题,并且通过设置恰当的分层学习率,CRF能有效处理命名实体标签间的依赖关系,这为中医医案的高价值数据挖掘奠定的坚实基础。
英文摘要:
      Objective Named entity recognition of TCM case records is the premise of high-value data mining. In order to solve the problem of the low efficiency of named entity recognition in TCM case records, this paper proposes a neural network based on pre-trained model and Conditional Random Field (CRF). Methods Manually label 10 types of named entities in the selected TCM case records as training and validation sets, and construct a neural network based on BERT, RoBERTa, ALBERT and CRF to explore the best pre-trained model for named entity recognition tasks in TCM case records and CRF contributes to it. Results The neural network based on RoBERTa-CRF has the best performance in the task of named entity recognition of TCM case records. Its overall accuracy for named entities is 99.33%, precision is 98.24%, recall is 98.51%, and F1 score is 98.38%. Conclusion The neural network constructed based on RoBERTa-CRF can effectively realize the named entity recognition of TCM case records and solve the problem of low efficiency. And by setting an appropriate hierarchical learning rate, CRF can effectively handle the dependencies between named entity labels. This lays a solid foundation for the high-value data mining of TCM case records.
View Fulltext   查看/发表评论  下载PDF阅读器
关闭
微信公众号
分享按钮