• 《工程索引》(EI)刊源期刊
    • 中文核心期刊
    • 中國科技論文統計源期刊
    • 中國科學引文數據庫來源期刊

    留言板

    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

    姓名
    郵箱
    手機號碼
    標題
    留言內容
    驗證碼

    基于領域詞典與CRF雙層標注的中文電子病歷實體識別

    龔樂君 張知菲

    龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別[J]. 工程科學學報, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
    引用本文: 龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別[J]. 工程科學學報, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
    GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
    Citation: GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004

    基于領域詞典與CRF雙層標注的中文電子病歷實體識別

    doi: 10.13374/j.issn2095-9389.2019.09.04.004
    基金項目: 國家自然科學基金資助項目(61502243,61502247,61572263);浙江省智慧醫療工程技術研究中心資助項目(2016E10011);中國博士后基金資助項目(2018M632349);江蘇省高校自然科學基金資助項目(16KJB520003)
    詳細信息
      通訊作者:

      E-mail:glj98226@163.com

    • 中圖分類號: TP391.1

    Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF

    More Information
    • 摘要: 醫療實體識別是電子病歷文本信息抽取的基本任務。針對中文電子病歷文本復合實體較多、實體長度較長、句子成分缺失嚴重、實體邊界不清的語言特點以及標注語料難以獲取的現狀,提出了一種基于領域詞典和條件隨機場(CRF)的雙層標注模型。該模型通過對外部資源的統計分析構建醫療領域詞典,再結合條件隨機場,進行了兩次不同粒度的標注,將領域詞典識別的準確性和機器學習的自動性融為一體,從中文電子病歷文本中識別出疾病、癥狀、藥品、操作四類醫療實體。該模型在測試數據中的宏精確率為96.7%、宏召回率為97.7%、宏F1值為97.2%。同時對比分析了采用注意力機制的深度神經網絡的識別效果,因受到領域數據集大小的限制,在該測試數據集中后者表現不佳。實驗結果表明了該雙層標注模型對中文醫療實體識別的高效性。

       

    • 圖  1  基于領域詞典與CRF的雙層標注模型

      Figure  1.  Double-layer annotation model

      圖  2  DLAM與BiLSTM-Attention-CRF實體級別精確率對比

      Figure  2.  DLAM and BiLSTM-Attention-CRF precision comparison on entity

      圖  3  DLAM與BiLSTM-Attention-CRF實體級別召回率對比

      Figure  3.  DLAM and BiLSTM-Attention-CRF recall comparison on entity

      表  1  訓練集、測試集實體分布情況

      Table  1.   Distribution of entities among the training set and the test set

      DatasetDiseasesSymptomsDrugsOperationsTotal
      Training set701264854621386033
      Test set27310432089182442
      下載: 導出CSV

      表  2  領域詞典構成情況

      Table  2.   Distribution among the domain dictionary

      TypeDiseasesSymptomsOperationsDrugsKeywordsOrgansLocationPrivative
      Amount1212934611777303511612
      下載: 導出CSV

      表  3  CRF對比實驗結果

      Table  3.   Comparison experiment results of CRF %

      ModelMarco-PMarco-RMarco-F1
      Baseline(Single-layer CRF)83.368.168.1
      DLAM96.797.797.2
      下載: 導出CSV

      表  4  BiLSTM-Attention-CRF對比實驗結果

      Table  4.   Comparison experiment results of BiLSTM-Attention-CRF %

      Different characters embeddingMarco-PMarco-RMarco-F1
      Randomly initializes embedding69.5269.7069.38
      50-dimension embedding53.4254.3153.74
      150-dimension embedding73.4377.8575.54
      300-dimension embedding55.3661.0357.88
      下載: 導出CSV

      表  5  DLAM與現有模型結果對比

      Table  5.   Comparison of DLAM and existing model results %

      ModelMarco-PMarco-RMarco-F1
      CRF_multi-features[27]92.0387.0989.49
      BiLSTM-CRF[27]91.1289.7490.43
      DLAM96.7097.7097.20
      下載: 導出CSV
      中文字幕在线观看
    • [1] Zhang L B. Word Segmentation and Named Entity Mining Based on Semi Supervised Learning for Chinese EMR[Dissertation]. Harbin: Harbin Institute of Technology, 2014

      張立邦. 基于半監督學習的中文電子病歷分詞和名實體挖掘[學位論文]. 哈爾濱: 哈爾濱工業大學, 2014
      [2] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[J/OL]. arXiv preprint. (2015-08-09) [2019-09-04]. https://arxiv.org/abs/1508.01991
      [3] Wang Y Q, Yu Z H, Chen L, et al. Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J Biomed Inf, 2014, 47: 91 doi: 10.1016/j.jbi.2013.09.008
      [4] Xu Y, Wang Y N, Liu T R, et al. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. J Am Med Inf Assoc, 2014, 21(e1): e84 doi: 10.1136/amiajnl-2013-001806
      [5] Lei J B, Tang B Z, Lu X Q, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inf Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
      [6] Xu Y, Ge Y Q, Wang Q, et al. Medical name entity recognition and application in Chinese admission record of stroke patients based on CRF and RUTA rule. J Sun Yat-sen Univ Med Sci, 2018, 39(3): 455

      許源, 葛艷秋, 王強, 等. 基于CRF與RUTA規則相結合的卒中入院記錄醫學實體識別及應用. 中山大學學報(醫學版), 2018, 39(3):455
      [7] Zhang X W, Li Z. Chinese electronic medical record named entity recognition based on multi-feature fusion. Softw Guide, 2017, 16(2): 128

      張祥偉, 李智. 基于多特征融合的中文電子病歷命名實體識別. 軟件導刊, 2017, 16(2):128
      [8] Yu L, Jin L Z, Wang M F, et al. Recognition of human hypoxic state based on deep learning. Chin J Eng, 2019, 41(6): 817

      于露, 金龍哲, 王夢飛, 等. 基于深度學習的人體低氧狀態識別. 工程科學學報, 2019, 41(6):817
      [9] Xia Y B, Zhen J L, Zhao Y F, et al. Deep learning based named entity recognition of electronic medical record. Electron Sci Technol, 2018, 31(11): 31

      夏宇彬, 鄭建立, 趙逸凡, 等. 基于深度學習的電子病歷命名實體識別. 電子科技, 2018, 31(11):31
      [10] Li F, Zhang M S, Tian B, et al. Recognizing irregular entities in biomedical text via deep neural networks. Pattern Recognit Lett, 2018, 105: 105 doi: 10.1016/j.patrec.2017.06.009
      [11] Liu Z J, Yang M, Wang X L, et al. Entity recognition from clinical texts via recurrent neural networks. BMC Med Inf Decis Making, 2017, 17(Suppl 2): 67
      [12] Chowdhury S, Dong X S, Qian L J, et al. A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. BMC Bioinf, 2018, 19(Suppl 17): 499
      [13] Shen Z. Named Entity Recognition for Chinese Electronic Record with Neural Network[Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2018

      申站.基于神經網絡的中文電子病歷命名實體識別[學位論文]. 北京: 北京郵電大學, 2018
      [14] Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database, 2016, 2016: baw140 doi: 10.1093/database/baw140
      [15] Wu Y H, Yang X, Bian J, et al. Combine factual medical knowledge and distributed word representation to improve clinical named entity recognition. AMIA Annu Symp Proc, 2018, 2018: 1110
      [16] Jagannatha A N, Yu H. Bidirectional RNN for medical event detection in electronic health records // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California, 2016: 473
      [17] Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records[J/OL]. arXiv preprint. (2018-05-11) [2019-09-04]. https://arxiv.org/abs/1801.07860
      [18] Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inf, 2018, 77: 34 doi: 10.1016/j.jbi.2017.11.011
      [19] Luka G, Andrey K, Paul G, et al. Named entity recognition in electronic health records using transfer learning bootstrapped neural networks[J/OL]. arXiv preprint. (2019-07-29) [2019-09-04]. https://arxiv.org/abs/1901.01592
      [20] Li W, Zhao D Z, Li B, et al. Combining CRF and rule based medical named entity recognition. Appl Res Comput, 2015, 32(4): 1082 doi: 10.3969/j.issn.1001-3695.2015.04.029

      栗偉, 趙大哲, 李博, 等. CRF與規則相結合的醫學病歷實體識別. 計算機應用研究, 2015, 32(4):1082 doi: 10.3969/j.issn.1001-3695.2015.04.029
      [21] Shi C Y, Xu Z J, Yang X J. Study of TFIDF algorithm. J Comput Appl, 2009, 29(Suppl 1): 167

      施聰鶯, 徐朝軍, 楊曉江. TFIDF算法研究綜述. 計算機應用, 2009, 29(增刊 1):167
      [22] Li H, Statistical learning methods. Beijing: Tsinghua University Press, 2012

      李航. 統計學習方法. 北京: 清華大學出版社, 2012
      [23] Yang J F, Guan Y, He B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records. J Softw, 2016, 27(11): 2725

      楊錦鋒, 關毅, 何彬, 等. 中文電子病歷命名實體和實體關系語料庫構建. 軟件學報, 2016, 27(11):2725
      [24] Uzuner O, South B R, Shen S Y, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inf Assoc, 2011, 18(5): 552 doi: 10.1136/amiajnl-2011-000203
      [25] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]. arXiv preprint. (2017-12-06) [2019-09-04]. https://arxiv.org/abs/1706.03762
      [26] Luo L, Yang Z, Yang P, et al. An attention-based BiLSTM-CRF approach to document level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
      [27] Zhang Y, Wang X W, Hou Z, et al. Clinical named entity recognition from Chinese electronic health records via machine learning methods. JMIR Med Inf, 2018, 6(4): e50 doi: 10.2196/medinform.9965
    • 加載中
    圖(3) / 表(5)
    計量
    • 文章訪問數:  2030
    • HTML全文瀏覽量:  2135
    • PDF下載量:  100
    • 被引次數: 0
    出版歷程
    • 收稿日期:  2019-09-04
    • 刊出日期:  2020-04-01

    目錄

      /

      返回文章
      返回