• Volume 43 Issue 9
    Sep.  2021
    Turn off MathJax
    Article Contents
    GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006
    Citation: GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering, 2021, 43(9): 1190-1196. doi: 10.13374/j.issn2095-9389.2021.01.12.006

    Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism

    doi: 10.13374/j.issn2095-9389.2021.01.12.006
    More Information
    • Corresponding author: E-mail: nanfly@126.com
    • Received Date: 2021-01-12
      Available Online: 2021-03-02
    • Publish Date: 2021-09-18
    • Medical records, as an essential part of the health care records of residents, save all the information about the clinical treatment of patients, which are traditionally written by doctors on paper. With the development of information technologies, electronic medical records that are more easily saved and managed gradually replace the traditional ones. Intelligent auxiliary diagnosis, patients’ portrait construction, and disease prediction based on medical reports have become research hotspots in the field of intelligent medical care. To fully discover the hidden relationship between symptoms and diseases from the documents saved in electronic medical records, the development of an efficient named entity recognition algorithm is the key issue. Although several studies have been conducted on it, there is relatively little research on the information extraction of Chinese electronic medical records. To the best of our knowledge, the documents in Chinese electronic medical records contain a large number of nested named entities and short sentences. Moreover, there is weak logic among the sentences, causing a complex syntax structure. To effectively recognize the medical entities, a novel named entity recognition method based on multifeature embedding and attention mechanism was proposed. After embedding three types of features derived from characters, words, and glyphs in the input presentation layer, an attention machine was introduced to the hidden layer of the bidirectional long short-term memory network to make the model focus on the characters related to the medical entities. Finally, the optimal labels for the five types of entities in Chinese electronic medical records, including diseases, body parts, symptoms, drugs, and operations, were obtained. The experimental results for the open and self-built Chinese electronic medical records, recognition accuracy, recall rate, and F1 value of the proposed algorithm are all better than 97%, which shows that the proposed algorithm can effectively identify various entities in Chinese electronic medical records.

       

    • loading
    • [1]
      唐國強, 高大啟, 阮彤, 等. 融入語言模型和注意力機制的臨床電子病歷命名實體識別. 計算機科學, 2020, 47(3):211 doi: 10.11896/jsjkx.190200259

      Tang G Q, Gao D Q, Ruan T, et al. Clinical electronic medical record named entity recognition incorporating language model. Comput Sci, 2020, 47(3): 211 doi: 10.11896/jsjkx.190200259
      [2]
      Topol E J. High-performance medicine: The convergence of human and artificial intelligence. Nat Med, 2019, 25(1): 44 doi: 10.1038/s41591-018-0300-7
      [3]
      He J, Baxter S L, Xu J, et al. The practical implementation of artificial intelligence technologies in medicine. Nat Med, 2019, 25(1): 30 doi: 10.1038/s41591-018-0307-0
      [4]
      李博, 康曉東, 張華麗, 等. 采用Transformer-CRF的中文電子病歷命名實體識別. 計算機工程與應用, 2020, 56(5):153 doi: 10.3778/j.issn.1002-8331.1909-0211

      Li B, Kang X D, Zhang H L, et al. Named entity recognition in Chinese electronic medical records using transformer-CRF. Comput Eng Appl, 2020, 56(5): 153 doi: 10.3778/j.issn.1002-8331.1909-0211
      [5]
      Luo L, Yang Z H, Yang P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
      [6]
      Xu K, Yang Z G, Kang P P, et al. Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput Biol Med, 2019, 108: 122 doi: 10.1016/j.compbiomed.2019.04.002
      [7]
      楊錦鋒, 于秋濱, 關毅, 等. 電子病歷命名實體識別和實體關系抽取研究綜述. 自動化學報, 2014, 40(8):1537

      Yang J F, Yu Q B, Guan Y, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction. Acta Autom Sin, 2014, 40(8): 1537
      [8]
      Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
      [9]
      Hirschberg J, Manning C D. Advances in natural language processing. Science, 2015, 349(6245): 261 doi: 10.1126/science.aaa8685
      [10]
      Wang Q, Zhou Y M, Ruan T, et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J Biomed Informatics, 2019, 92: 103133 doi: 10.1016/j.jbi.2019.103133
      [11]
      Shang J B, Liu L Y, Gu X T, et al. Learning named entity tagger using domain-specific dictionary//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 2054
      [12]
      Kraus S, Blake C, West S L. Information extraction from medical notes [J/OL]. arXiv preprint (2007-07-24) [2020-12-26]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.3671&rep=rep1&type=pdf.
      [13]
      Gorinski P J, Wu H H, Grover C, et al. Named entity recognition for electronic health records: A comparison of rule-based and machine learning approaches [J/OL]. arXiv preprint (2019-04-25) [2020-12-26]. https://arxiv.org/pdf/1903.03985.pdf.
      [14]
      Ma X Z, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [J/OL]. arXiv preprint (2016-05-29) [2020-12-26]. https://arxiv.org/pdf/1603.01354.pdf.
      [15]
      Zhang Y, Yang J. Chinese NER Using Lattice LSTM [J/OL]. arXiv preprint (2018-07-05) [2020-12-26]. https://arxiv.org/pdf/1805.02023.pdf.
      [16]
      Alsentzer E, Murphy J R, Boag W, et al. Publicly available clinical BERT embeddings [J/OL]. arXiv preprint (2019-6-20) [2020-12-26]. https://arxiv.org/pdf/1904.03323.pdf.
      [17]
      Jiang M, Chen Y K, Liu M, et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc, 2011, 18(5): 601 doi: 10.1136/amiajnl-2011-000163
      [18]
      Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database (Oxford), 2016, 140: 1
      [19]
      龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別. 工程科學學報, 2020, 42(4):469

      Gong L J, Zhang Z F. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF. Chin J Eng, 2020, 42(4): 469
      [20]
      Hu J L, Shi X, Liu Z J, et al.HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text//Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2017). Chengdu, 2017: 1
      [21]
      Mikolov T, Grave E, Bojanowski P, et al. Advances in pre-training distributed word representations [J/OL]. arXiv preprint (2017-12-26) [2020-12-26]. https://arxiv.org/pdf/1712.09405.pdf.
      [22]
      Pennington J, Socher R, Manning C. GloVe: global vectors for word representation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, 2014: 1532
      [23]
      Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J/OL]. arXiv preprint (2017-12-06) [2020-12-26]. https://arxiv.org/pdf/1706.03762.pdf.
      [24]
      Choi E, Bahadori M T, Kulas J A, et al. RETAIN: interpretable predictive model in healthcare using reverse time attention mechanism [J/OL]. arXiv preprint (2016-08-19) [2020-12-26]. https://arxiv.org/pdf/1608.05745.pdf.
      [25]
      Zhu Q L, Li X L, Conesa A, et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics, 2018, 34(9): 1547 doi: 10.1093/bioinformatics/btx815
      [26]
      Wu G H, Tang G G, Wang Z R, et al. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, 2019, 7: 113942 doi: 10.1109/ACCESS.2019.2935223
    • 加載中

    Catalog

      通訊作者: 陳斌, bchen63@163.com
      • 1. 

        沈陽化工大學材料科學與工程學院 沈陽 110142

      1. 本站搜索
      2. 百度學術搜索
      3. 萬方數據庫搜索
      4. CNKI搜索

      Figures(2)  / Tables(5)

      Article views (1268) PDF downloads(177) Cited by()
      Proportional views
      Related

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return
      中文字幕在线观看