• 《工程索引》(EI)刊源期刊
    • 中文核心期刊
    • 中國科技論文統計源期刊
    • 中國科學引文數據庫來源期刊

    留言板

    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

    姓名
    郵箱
    手機號碼
    標題
    留言內容
    驗證碼

    基于改進YOLACT實例分割網絡的人耳關鍵生理曲線提取

    袁立 夏桐 張曉爽

    袁立, 夏桐, 張曉爽. 基于改進YOLACT實例分割網絡的人耳關鍵生理曲線提取[J]. 工程科學學報, 2022, 44(8): 1386-1395. doi: 10.13374/j.issn2095-9389.2021.01.11.005
    引用本文: 袁立, 夏桐, 張曉爽. 基于改進YOLACT實例分割網絡的人耳關鍵生理曲線提取[J]. 工程科學學報, 2022, 44(8): 1386-1395. doi: 10.13374/j.issn2095-9389.2021.01.11.005
    YUAN Li, XIA Tong, ZHANG Xiao-shuang. Physiological curve extraction of the human ear based on the improved YOLACT[J]. Chinese Journal of Engineering, 2022, 44(8): 1386-1395. doi: 10.13374/j.issn2095-9389.2021.01.11.005
    Citation: YUAN Li, XIA Tong, ZHANG Xiao-shuang. Physiological curve extraction of the human ear based on the improved YOLACT[J]. Chinese Journal of Engineering, 2022, 44(8): 1386-1395. doi: 10.13374/j.issn2095-9389.2021.01.11.005

    基于改進YOLACT實例分割網絡的人耳關鍵生理曲線提取

    doi: 10.13374/j.issn2095-9389.2021.01.11.005
    基金項目: 國家自然科學基金資助項目(61472031)
    詳細信息
      通訊作者:

      E-mail: lyuan@ustb.edu.cn

    • 中圖分類號: TP391.41

    Physiological curve extraction of the human ear based on the improved YOLACT

    More Information
    • 摘要: 在人耳形狀聚類、3D人耳建模、個人定制耳機等相關工作中,獲取人耳的一些關鍵生理曲線和關鍵點的準確位置非常重要。傳統的邊緣提取方法對光照和姿勢變化非常敏感。本文提出了一種基于ResNeSt和篩選模板策略的改進YOLACT實例分割網絡,分別從定位和分割兩方面對原始YOLACT算法進行改進,通過標注人耳數據集,訓練改進的YOLACT模型,并在預測階段使用改進的篩選模板策略,可以準確地分割人耳的不同區域并提取關鍵的生理曲線。相較于其他方法,本文方法在測試圖像集上顯示出更好的分割精度,且對人耳姿態變化時具有一定的魯棒性。

       

    • 圖  1  改進YOLACT模型提取人耳關鍵生理曲線系統框圖

      Figure  1.  System block diagram of the improved YOLACT model for extracting the key physiological curves of the human ear

      圖  2  拆分注意力模塊結構[19]. (a) 整體結構; (b) cardinal內部結構

      Figure  2.  Split attention module structure[19]: (a) entire frame; (b) cardinal internal structure

      圖  3  原型模板生成模塊

      Figure  3.  Prototype mask generation module

      圖  4  目標檢測模塊

      Figure  4.  Object detection module

      圖  5  模板處理. (a) 原圖; (b) 邊框和模板預測結果; (c) 裁剪模板結果; (d) 各區域外接矩形; (e) 篩選模板結果

      Figure  5.  Mask processing: (a) original image; (b) prediction of boxes and masks; (c) segmentation result with the cropping mask strategy; (d) bounding boxes of different regions; (e) segmentation result with the screening mask strategy

      圖  6  圖像集示例. (a) 原圖; (b) 關鍵曲線; (c) 標注示例

      Figure  6.  Image dataset: (a) original image; (b) key curves; (c) annotation examples

      圖  7  損失曲線. (a) 位置損失; (b) 分類損失; (c) 模板損失

      Figure  7.  Loss curves: (a) box loss; (b) class loss; (c) mask loss

      圖  8  不同人耳的分割結果. (a)裁剪模板的結果; (b)篩選模板的結果

      Figure  8.  Segmentation results for different human ear: (a) cropping mask results; (b) screening mask results

      圖  9  不同人耳三種方法的分割效果. (a) 原圖; (b) 改進的YOLACT; (c) DeepLabV3+; (d) 傳統輪廓估計

      Figure  9.  Segmentation effect of three methods for different ears: (a) original image; (b) improved YOLACT; (c) DeepLabV3+; (d) traditional contour estimation

      圖  10  兩階段卷積神經網絡提取6個人耳關鍵點

      Figure  10.  Two-stage convolutional neural network for extracting six key points of the human ear

      表  1  訓練超參數

      Table  1.   Training hyperparameters

      max_sizelr_stepsmax_iterbatch_size
      550(30000, 60000, 90000)1200008
      下載: 導出CSV

      表  2  不同YOLACT模型的分割精度

      Table  2.   Segmentation accuracy of different YOLACT models

      ModelmIOUDice coefficient
      YOLACT?ResNet101-crop0.95140.9943
      YOLACT?ResNet101-select0.95180.9943
      YOLACT?ResNet101-crop0.95390.9950
      YOLACT?ResNet101-select0.95440.9950
      下載: 導出CSV

      表  3  YOLACT?ResNeSt101模型精度

      Table  3.   Accuracy of the YOLACT?ResNeSt101 model %

      YOLACT?ResNeSt101mAP_allmAP50mAP70mAP90
      Box95.1410010096.63
      Mask98.1310010097.98
      下載: 導出CSV

      表  4  模型改進前后提取關鍵曲線的準確率對比

      Table  4.   Comparison of curve extraction accuracy before and after model improvement

      ModelAccuracy
      YOLACT?ResNet101-crop308/410
      YOLACT?ResNet101-select381/410
      YOLACT?ResNeSt101-crop344/410
      YOLACT?ResNeSt101-select395/410
      下載: 導出CSV

      表  5  模型改進前后實時性對比

      Table  5.   Real-time performance before and after model improvement

      ModelFPSTime/s
      YOLACT?ResNet101-crop24.616.6
      YOLACT?ResNet101-select24.816.5
      YOLACT?ResNeSt101-crop16.624.6
      YOLACT?ResNeSt101-select16.824.4
      下載: 導出CSV

      表  6  不同網絡模型分割精度比較

      Table  6.   Accuracy comparison of different segmentation models

      ModelmIOU
      DeepLabV3+0.8853
      Improved YOLACT0.9544
      下載: 導出CSV
      中文字幕在线观看
    • [1] Yang Y R, Wu H B. Anatomical study of auricle. Chin J Anat, 1988, 11(1): 56

      楊月如, 吳紅斌. 耳廓的解剖學研究. 解剖學雜志, 1988, 11(1):56
      [2] Qi N, Li L, Zhao W. Morphometry and classification of Chinese adult's auricles. Tech Acoust, 2010, 29(5): 518

      齊娜, 李莉, 趙偉. 中國成年人耳廓形態測量及分類. 聲學技術, 2010, 29(5):518
      [3] Azaria R, Adler N, Silfen R, et al. Morphometry of the adult human earlobe: A study of 547 subjects and clinical application. Plast Reconstr Surg, 2003, 111(7): 2398 doi: 10.1097/01.PRS.0000060995.99380.DE
      [4] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation // Lecture Notes in Computer Science. Munich, 2015: 234
      [5] Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation // 2016 Fourth International Conference on 3D Vision (3DV). Stanford, 2016: 565
      [6] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation // 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, 2015: 1520
      [7] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 6230
      [8] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2017: 2536
      [9] Wang Z M, Liu Z H, Huang Y K, et al. Efficient wagon number recognition based on deep learning. Chin J Eng, 2020, 42(11): 1525

      王志明, 劉志輝, 黃洋科, 等. 基于深度學習的高效火車號識別. 工程科學學報, 2020, 42(11):1525
      [10] Chen X L, Girshick R, He K M, et al. TensorMask: A foundation for dense object segmentation // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 2061
      [11] Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, 2016: 3150
      [12] Li Y, Qi H Z, Dai J F, et al. Fully convolutional instance-aware semantic segmentation // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 4438
      [13] Bolya D, Zhou C, Xiao F Y, et al. YOLACT: real-time instance segmentation // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 9156
      [14] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN // 2017 IEEE International Conference on Computer Vision (ICCV). Venice, 2017: 2980
      [15] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137 doi: 10.1109/TPAMI.2016.2577031
      [16] Cai Z W, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 6154
      [17] Qin Z, Li Z M, Zhang Z N, et al. ThunderNet: towards real-time generic object detection on mobile devices // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 6717
      [18] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, 2016: 770
      [19] Zhang H, Wu C R, Zhang Z Y, et al. ResNeSt: Split-attention networks [J/OL]. ArXiv Preprint (2020-04-19) [2020-12-31].https://arxiv.org/abs/2004.08955
      [20] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 936
      [21] Xie S N, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 5987
      [22] Hu J, Shen L, Sun G. Squeeze-and-excitation networks // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7132
      [23] Li X, Wang W H, Hu X L, et al. Selective kernel networks // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 510
      [24] He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 558
      [25] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39(4): 640 doi: 10.1109/TPAMI.2016.2572683
      [26] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector // Computer Vision – ECCV 2016. Amsterdam, 2016: 21
      [27] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context // Computer Vision – ECCV 2014. Zurich, 2014: 740
      [28] Zhang Y, Mu Z C, Yuan L, et al. USTB-helloear: A large database of ear images photographed under uncontrolled conditions // International Conference on Image and Graphics. Shanghai, 2017: 405
      [29] Yuan L, Zhao H N, Zhang Y, et al. Ear alignment based on convolutional neural network // Chinese Conference on Biometric Recognition. Urumqi, 2018: 562
    • 加載中
    圖(10) / 表(6)
    計量
    • 文章訪問數:  645
    • HTML全文瀏覽量:  291
    • PDF下載量:  69
    • 被引次數: 0
    出版歷程
    • 收稿日期:  2021-01-11
    • 網絡出版日期:  2021-06-18
    • 刊出日期:  2022-07-06

    目錄

      /

      返回文章
      返回