• 《工程索引》(EI)刊源期刊
    • 中文核心期刊
    • 中國科技論文統計源期刊
    • 中國科學引文數據庫來源期刊

    留言板

    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

    姓名
    郵箱
    手機號碼
    標題
    留言內容
    驗證碼

    融合多特征嵌入與注意力機制的中文電子病歷命名實體識別

    鞏敦衛 張永凱 郭一楠 王斌 樊寬魯 火焱

    鞏敦衛, 張永凱, 郭一楠, 王斌, 樊寬魯, 火焱. 融合多特征嵌入與注意力機制的中文電子病歷命名實體識別[J]. 工程科學學報, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006
    引用本文: 鞏敦衛, 張永凱, 郭一楠, 王斌, 樊寬魯, 火焱. 融合多特征嵌入與注意力機制的中文電子病歷命名實體識別[J]. 工程科學學報, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006
    GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006
    Citation: GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006

    融合多特征嵌入與注意力機制的中文電子病歷命名實體識別

    doi: 10.13374/j.issn2095-9389.2021.01.12.006
    基金項目: 國家自然科學基金資助項目(61973305,61773384);中國礦業大學中央高校基本科研業務費專項資金資助項目(2020ZDPY0302)
    詳細信息
      通訊作者:

      E-mail:nanfly@126.com

    • 中圖分類號: TP391.1

    Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism

    More Information
    • 摘要: 中文電子病歷文本包含大量嵌套實體、句子語法結構復雜、句式偏短。為有效識別其醫療實體,提出一種融合多特征嵌入與注意力機制的命名實體識別算法,在輸入表示層融合字符、單詞、字形三個粒度的特征,并在雙向長短期記憶網絡的隱含層引入注意力機制,使算法在捕獲特征時更加關注于醫療實體相關的字符,最終實現對中文電子病歷中疾病、身體部位、癥狀、藥物、操作五類實體的最優標注。面向開源和自建糖尿病數據集的實驗結果中所提算法的實體識別準確率、召回率和F1值都達到97%以上,表明其可以更加有效地識別中文電子病歷中各類實體。

       

    • 圖  1  MFBAC算法框架

      Figure  1.  MFBAC framework

      圖  2  不同算法的F1值

      Figure  2.  Comparison on the F1 values of different NER models

      表  1  命名實體類別

      Table  1.   Types of named entities

      The entity classIdentifierDefinition of categories
      DiseasesB-diseases I-diseasesTerms of various diseases
      SymptomB-symptom I-symptomAbnormal physical manifestations
      BodyB-body I-bodyVarious parts of the human body
      DrugB-drug I-drugThe names of various medicines
      TestB-test I-testVarious physical examinations
      下載: 導出CSV

      表  2  訓練集與測試集醫療實體分布

      Table  2.   Distribution of training and test datasets for medical entities

      DatasetTraining dataTest data
      Diseases856382
      Symptom38451526
      Body563214
      Drug657289
      Test34261647
      Total93474058
      下載: 導出CSV

      表  3  不同特征嵌入下的命名實體識別性能

      Table  3.   Performance of NER embedding different features

      ModelP/%R/%F1/%
      Font embedding-BiLSTM-CRF79.5180.3579.72
      Char embedding-BiLSTM-CRF88.6187.4387.96
      Word embedding-BiLSTM-CRF85.8286.8786.32
      CW embedding-BiLSTM-CRF86.5887.2387.62
      CWF embedding-BiLSTM-CRF96.2497.2596.94
      下載: 導出CSV

      表  4  注意力機制對不同特征嵌入的影響

      Table  4.   Performance of NER with attention

      ModelP/%R/%F1/%
      Font embedding-BiLSTM-Att-CRF92.4693.1292.68
      Char embedding-BiLSTM-Att-CRF93.4193.5693.49
      Word embedding-BiLSTM-Att-CRF96.3696.1896.21
      CW embedding -BiLSTM-Att-CRF96.5296.1896.45
      CWF embedding -BiLSTM-Att-CRF97.2197.8397.54
      下載: 導出CSV

      表  5  不同算法的性能對比

      Table  5.   Comparison of the performance of different NER models

      ModelP/
      %
      R/
      %
      F1/
      %
      Loading
      time/s
      Testing
      time/s
      Transformer85.4686.3285.684.3312.6
      BiGRU-CRF85.8786.2386.142.959.4
      BiLSTM-CRF88.6187.4395.163.219.81
      Attention-BiLSTM-CRF94.5296.1896.453.5610.56
      Transformer-CRF95.3294.6294.145.3213.57
      MFBAC97.2197.8397.544.3411.68
      下載: 導出CSV
      中文字幕在线观看
    • [1] Tang G Q, Gao D Q, Ruan T, et al. Clinical electronic medical record named entity recognition incorporating language model. Comput Sci, 2020, 47(3): 211 doi: 10.11896/jsjkx.190200259

      唐國強, 高大啟, 阮彤, 等. 融入語言模型和注意力機制的臨床電子病歷命名實體識別. 計算機科學, 2020, 47(3):211 doi: 10.11896/jsjkx.190200259
      [2] Topol E J. High-performance medicine: The convergence of human and artificial intelligence. Nat Med, 2019, 25(1): 44 doi: 10.1038/s41591-018-0300-7
      [3] He J, Baxter S L, Xu J, et al. The practical implementation of artificial intelligence technologies in medicine. Nat Med, 2019, 25(1): 30 doi: 10.1038/s41591-018-0307-0
      [4] Li B, Kang X D, Zhang H L, et al. Named entity recognition in Chinese electronic medical records using transformer-CRF. Comput Eng Appl, 2020, 56(5): 153 doi: 10.3778/j.issn.1002-8331.1909-0211

      李博, 康曉東, 張華麗, 等. 采用Transformer-CRF的中文電子病歷命名實體識別. 計算機工程與應用, 2020, 56(5):153 doi: 10.3778/j.issn.1002-8331.1909-0211
      [5] Luo L, Yang Z H, Yang P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
      [6] Xu K, Yang Z G, Kang P P, et al. Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput Biol Med, 2019, 108: 122 doi: 10.1016/j.compbiomed.2019.04.002
      [7] Yang J F, Yu Q B, Guan Y, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction. Acta Autom Sin, 2014, 40(8): 1537

      楊錦鋒, 于秋濱, 關毅, 等. 電子病歷命名實體識別和實體關系抽取研究綜述. 自動化學報, 2014, 40(8):1537
      [8] Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
      [9] Hirschberg J, Manning C D. Advances in natural language processing. Science, 2015, 349(6245): 261 doi: 10.1126/science.aaa8685
      [10] Wang Q, Zhou Y M, Ruan T, et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J Biomed Informatics, 2019, 92: 103133 doi: 10.1016/j.jbi.2019.103133
      [11] Shang J B, Liu L Y, Gu X T, et al. Learning named entity tagger using domain-specific dictionary//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 2054
      [12] Kraus S, Blake C, West S L. Information extraction from medical notes [J/OL]. arXiv preprint (2007-07-24) [2020-12-26]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.3671&rep=rep1&type=pdf.
      [13] Gorinski P J, Wu H H, Grover C, et al. Named entity recognition for electronic health records: A comparison of rule-based and machine learning approaches [J/OL]. arXiv preprint (2019-04-25) [2020-12-26]. https://arxiv.org/pdf/1903.03985.pdf.
      [14] Ma X Z, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [J/OL]. arXiv preprint (2016-05-29) [2020-12-26]. https://arxiv.org/pdf/1603.01354.pdf.
      [15] Zhang Y, Yang J. Chinese NER Using Lattice LSTM [J/OL]. arXiv preprint (2018-07-05) [2020-12-26]. https://arxiv.org/pdf/1805.02023.pdf.
      [16] Alsentzer E, Murphy J R, Boag W, et al. Publicly available clinical BERT embeddings [J/OL]. arXiv preprint (2019-6-20) [2020-12-26]. https://arxiv.org/pdf/1904.03323.pdf.
      [17] Jiang M, Chen Y K, Liu M, et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc, 2011, 18(5): 601 doi: 10.1136/amiajnl-2011-000163
      [18] Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database (Oxford), 2016, 140: 1
      [19] Gong L J, Zhang Z F. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF. Chin J Eng, 2020, 42(4): 469

      龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別. 工程科學學報, 2020, 42(4):469
      [20] Hu J L, Shi X, Liu Z J, et al.HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text//Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2017). Chengdu, 2017: 1
      [21] Mikolov T, Grave E, Bojanowski P, et al. Advances in pre-training distributed word representations [J/OL]. arXiv preprint (2017-12-26) [2020-12-26]. https://arxiv.org/pdf/1712.09405.pdf.
      [22] Pennington J, Socher R, Manning C. GloVe: global vectors for word representation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, 2014: 1532
      [23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J/OL]. arXiv preprint (2017-12-06) [2020-12-26]. https://arxiv.org/pdf/1706.03762.pdf.
      [24] Choi E, Bahadori M T, Kulas J A, et al. RETAIN: interpretable predictive model in healthcare using reverse time attention mechanism [J/OL]. arXiv preprint (2016-08-19) [2020-12-26]. https://arxiv.org/pdf/1608.05745.pdf.
      [25] Zhu Q L, Li X L, Conesa A, et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics, 2018, 34(9): 1547 doi: 10.1093/bioinformatics/btx815
      [26] Wu G H, Tang G G, Wang Z R, et al. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, 2019, 7: 113942 doi: 10.1109/ACCESS.2019.2935223
    • 加載中
    圖(2) / 表(5)
    計量
    • 文章訪問數:  1185
    • HTML全文瀏覽量:  722
    • PDF下載量:  172
    • 被引次數: 0
    出版歷程
    • 收稿日期:  2021-01-12
    • 網絡出版日期:  2021-03-02
    • 刊出日期:  2021-09-18

    目錄

      /

      返回文章
      返回