• 《工程索引》(EI)刊源期刊
    • 中文核心期刊
    • 中國科技論文統計源期刊
    • 中國科學引文數據庫來源期刊

    留言板

    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

    姓名
    郵箱
    手機號碼
    標題
    留言內容
    驗證碼

    基于空間近鄰關系的非平衡數據重采樣算法

    李睿峰 李文海 孫艷麗 吳陽勇

    李睿峰, 李文海, 孫艷麗, 吳陽勇. 基于空間近鄰關系的非平衡數據重采樣算法[J]. 工程科學學報, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
    引用本文: 李睿峰, 李文海, 孫艷麗, 吳陽勇. 基于空間近鄰關系的非平衡數據重采樣算法[J]. 工程科學學報, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
    LI Rui-feng, LI Wen-hai, SUN Yan-li, WU Yang-yong. Resampling algorithm for imbalanced data based on their neighbor relationship[J]. Chinese Journal of Engineering, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
    Citation: LI Rui-feng, LI Wen-hai, SUN Yan-li, WU Yang-yong. Resampling algorithm for imbalanced data based on their neighbor relationship[J]. Chinese Journal of Engineering, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002

    基于空間近鄰關系的非平衡數據重采樣算法

    doi: 10.13374/j.issn2095-9389.2020.04.05.002
    基金項目: 軍內科研項目“新一代航空電子裝備測試關鍵技術研究”資助項目(4172122113R)
    詳細信息
      通訊作者:

      E-mail:dongzhi1110@foxmail.com

    • 中圖分類號: TP206.1

    Resampling algorithm for imbalanced data based on their neighbor relationship

    More Information
    • 摘要: 為了提高非平衡數據集的分類精度,提出了一種基于樣本空間近鄰關系的重采樣算法。該方法首先根據數據集中少數類樣本的空間近鄰關系進行安全級別評估,根據安全級別有指導的采用合成少數類過采樣技術(Synthetic minority oversampling technique,SMOTE)進行升采樣;然后對多數類樣本依據其空間近鄰關系計算局部密度,從而對多數類樣本密集區域進行降采樣處理。通過以上兩種手段可以均衡測試數據集,并控制數據規模防止過擬合,實現對兩類樣本分類的均衡化。采用十折交叉驗證的方式產生訓練集和測試集,在對訓練集重采樣之后,以核超限學習機作為分類器進行訓練,并在測試集上進行驗證。在UCI非平衡數據集和電路故障診斷實測數據上的實驗結果表明,所提方法在整體上優于其他重采樣算法。

       

    • 圖  1  RBNR算法流程圖

      Figure  1.  Flowchart of the RBNR algorithm

      圖  2  串聯穩壓電路

      Figure  2.  Serial regulating circuit

      圖  3  測試環境圖

      Figure  3.  Testing environment

      圖  4  BMS算法參數分析。(a)RC值分析;(b)F-valve值分析;(c)G-mean值分析

      Figure  4.  Parameter analysis of BMS: (a) analysis of the RC; (b) analysis of the F-valve; (c) analysis of the G-mean

      圖  5  結果對比柱狀圖。(a)RC值對比;(b)F-value值對比;(c)G-mean值對比

      Figure  5.  Bar graph of result comparison: (a) comparison of RC; (b) comparison of F-value; (c) comparison of G-mean

      表  1  混淆矩陣

      Table  1.   Confusion matrix

      CategoryClassified as minorityClassified as majority
      MinorityTPFN
      MajorityFPTN
      下載: 導出CSV

      表  2  選用的UCI數據集

      Table  2.   UCI data set

      Data setDimensionMinority /majorityImbalance ratio
      CTG21176/16551:9.403
      Diabetes8268/5001:1.866
      Glass942/1721:4.095
      Wine1348/1301:2.708
      下載: 導出CSV

      表  3  電路實測數據(部分)

      Table  3.   Some circuit measured data

      IDV1_max/VV1_min/VV2/VV3/VV4/VV5/VV6/VV7/VV8/VAttribute
      1?7.730?6.360?6.923?6.928?6.281?2.811?2.981?5.579?0.140normal
      2?7.794?6.337?6.953?6.955?6.297?2.781?2.969?5.603?0.134
      ……
      188?7.706?6.344?6.943?6.945?6.271?2.812?3.020?5.613?0.148
      189?7.760?6.622?7.106?7.089?6.533?2.656?2.456?4.548?0.133faulty
      ……
      233?7.792?6.597?7.078?7.049?6.503?2.670?2.544?4.726?0.113
      下載: 導出CSV

      表  4  F-value和G-mean性能比較

      Table  4.   Comparison between the F-value and G-mean

      Data setAlgorithmRC F-value G-mean Parameter value
      MeanStd MeanStd MeanStd Cσ
      CTGSMOTE10 0.97140.0782 0.99760.0045 0.14.9849
      RU-SMOTE10 0.98490.0389 0.99840.0034 14.9056
      BMS0.99830.0118 0.98250.0342 0.99720.0068 15.0038
      RBNR10 0.98700.0382 0.99880.0030 15.0123
      DiabetesSMOTE0.69660.0852 0.65150.0694 0.73180.0486 12.7590
      RU-SMOTE0.57750.1121 0.63300.0830 0.70790.0670 13.3938
      BMS0.66560.1102 0.65950.0801 0.73570.0652 0.13.0312
      RBNR0.78710.0895 0.68320.0624 0.75540.0497 0.13.0156
      GlassSMOTE0.89850.1529 0.89020.1125 0.93190.0865 101.2357
      RU-SMOTE0.85230.1934 0.86080.1266 0.89150.1558 101.2156
      BMS0.86560.2157 0.89090.1371 0.90620.1670 103.3978
      RBNR0.90860.1295 0.90620.0996 0.94160.0693 11.4562
      WineSMOTE10 0.98180.0513 0.99490.0152 103.9758
      RU-SMOTE10 0.97700.0507 0.99140.0181 103.6135
      BMS0.99710.0202 0.96000.0827 0.98740.0230 1004.0360
      RBNR10 0.97890.0454 0.99190.0146 103.7833
      RegulatorSMOTE0.92720.1303 0.84960.1067 0.93140.0715 10001.5781
      RU-SMOTE0.93200.2114 0.83040.1118 0.89990.1931 104.7342
      BMS0.86850.1930 0.87310.1007 0.90250.1526 0.013.6821
      RBNR0.90750.1248 0.89470.1043 0.93610.0699 104.6943
      下載: 導出CSV
      中文字幕在线观看
    • [1] Chen S, He H B, Garcia E A. RAMOBoost: Ranked minority oversampling in boosting. IEEE Trans Neural Networks, 2010, 21(10): 1624 doi: 10.1109/TNN.2010.2066988
      [2] Xiao Y C, Wang H G, Zhang L, et al. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Syst, 2014, 59: 75 doi: 10.1016/j.knosys.2014.01.020
      [3] Miao Z M, Zhao L W, Yuan W W, et al. Multi-class imbalanced learning implemented in network intrusion detection // 2011 International Conference on Computer Science and Service System (CSSS). Nanjing, 2011: 1395
      [4] Smailovi? J, Gr?ar M, Lavra? N, et al. Stream-based active learning for sentiment analysis in the financial domain. Inform Sci, 2014, 285: 181 doi: 10.1016/j.ins.2014.04.034
      [5] Liu Y Q, Wang C, Zhang L. Decision tree based predictive models for breast cancer survivability on imbalanced data // 2009 3rd International Conference on Bioinformatics and Biomedical Engineering. Beijing, 2009: 1
      [6] Gao M Z, Xu A Q, Xu Q. Fault detection method of electronic equipment based on SL-SMOTE and CS-RVM. Comput Eng Appl, 2019, 55(4): 185 doi: 10.3778/j.issn.1002-8331.1708-0032

      高明哲, 許愛強, 許晴. SL-SMOTE和CS-RVM結合的電子設備故障檢測方法. 計算機工程與應用, 2019, 55(4):185 doi: 10.3778/j.issn.1002-8331.1708-0032
      [7] Feng H W, Yao B, Gao Y, et al. Imbalanced data processing algorithm based on boundary mixed sampling. Control Decis, 2017, 32(10): 1831

      馮宏偉, 姚博, 高原, 等. 基于邊界混合采樣的非均衡數據處理算法. 控制與決策, 2017, 32(10):1831
      [8] Gao M, Hong X, Chen S, et al. A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing, 2011, 74(17): 3456 doi: 10.1016/j.neucom.2011.06.010
      [9] Gu P, Ouyang Y Y. Classification research for unbalanced data based on mixed-sampling. Appl Res Comput, 2015, 32(2): 379 doi: 10.3969/j.issn.1001-3695.2015.02.014

      古平, 歐陽源遊. 基于混合采樣的非平衡數據集分類研究. 計算機應用研究, 2015, 32(2):379 doi: 10.3969/j.issn.1001-3695.2015.02.014
      [10] Yu H L, Yang X B, Zheng S, et al. Active learning from imbalanced data: A solution of online weighted extreme learning machine. IEEE Trans Neural Networks Learn Syst, 2019, 30(4): 1088 doi: 10.1109/TNNLS.2018.2855446
      [11] Cai Y Y, Song X D. New fuzzy SVM model used in imbalanced datasets. J Xidian Univ Nat Sci, 2015, 42(5): 120

      蔡艷艷, 宋曉東. 針對非平衡數據分類的新型模糊SVM模型. 西安電子科技大學學報(自然科學版), 2015, 42(5):120
      [12] Wang C Y, Su H Y, Qu Y, et al. Imbalanced data sets classification method based on over-sampling technique. Comput Eng Appl, 2011, 47(1): 139 doi: 10.3778/j.issn.1002-8331.2011.01.038

      王春玉, 蘇宏業, 渠瑜, 等. 一種基于過抽樣技術的非平衡數據集分類方法. 計算機工程與應用, 2011, 47(1):139 doi: 10.3778/j.issn.1002-8331.2011.01.038
      [13] Zhang Y F, Guo H P, Zhi W M, et al. An ensemble pruning method for imbalanced data classification. Comput Eng, 2014, 40(6): 157 doi: 10.3969/j.issn.1000-3428.2014.06.034

      張銀峰, 郭華平, 職為梅, 等. 一種面向不平衡數據分類的組合剪枝方法. 計算機工程, 2014, 40(6):157 doi: 10.3969/j.issn.1000-3428.2014.06.034
      [14] Vong C M, Ip W F, Wong P K, et al. Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing, 2014, 128: 136 doi: 10.1016/j.neucom.2012.11.056
      [15] Zhai Y, Yang B R, Wang S P, et al. Under-sampling method based on cooperative co-evolutionary mechanism. J Univ Sci Technol Beijing, 2011, 33(12): 1550

      翟云, 楊炳儒, 王樹鵬, 等. 基于協同進化機制的欠采樣方法. 北京科技大學學報, 2011, 33(12):1550
      [16] Yang Y, Liu F, Jin Z Y, et al. Aliasing artefact suppression in compressed sensing MRI for random phase-encode undersampling. IEEE Trans Bio-Med Eng, 2015, 62(9): 2215 doi: 10.1109/TBME.2015.2419372
      [17] Jia C Z, Zuo Y. S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theoret Biol, 2017, 422: 84 doi: 10.1016/j.jtbi.2017.03.031
      [18] Wilson D L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern, 2007, SMC-2(3): 408
      [19] Zhao Z X, Wang G L, Li X D. An improved SVM based under-sampling method for classifying imbalanced data. Acta Sci Nat Univ Sunyatseni, 2012, 51(6): 10

      趙自翔, 王廣亮, 李曉東. 基于支持向量機的不平衡數據分類的改進欠采樣方法. 中山大學學報(自然科學版), 2012, 51(6):10
      [20] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321 doi: 10.1613/jair.953
      [21] Liu Y X, Liu S M, Liu T, et al. New oversampling algorithm DB_SMOTE. Comput Eng Appl, 2014, 50(6): 92 doi: 10.3778/j.issn.1002-8331.1308-0099

      劉余霞, 劉三民, 劉濤, 等. 一種新的過采樣算法DB_SMOTE. 計算機工程與應用, 2014, 50(6):92 doi: 10.3778/j.issn.1002-8331.1308-0099
      [22] Gu Q, Yuan L, Ning B, et al. A novel classification algorithm for imbalanced datasets based on hybrid resampling strategy. Comput Eng Sci, 2012, 34(10): 128 doi: 10.3969/j.issn.1007-130X.2012.09.024

      谷瓊, 袁磊, 寧彬, 等. 一種基于混合重取樣策略的非均衡數據集分類算法. 計算機工程與科學, 2012, 34(10):128 doi: 10.3969/j.issn.1007-130X.2012.09.024
      [23] Tao X M, Hao S Y, Zhang D X, et al. Support vector machine for unbalanced data based on sample properties under-sampling approaches. Control Decis, 2013, 28(7): 978

      陶新民, 郝思媛, 張冬雪, 等. 基于樣本特性欠取樣的不均衡支持向量機. 控制與決策, 2013, 28(7):978
      [24] Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem // Proceedings of Advances in Knowledge Discovery and Data Mining Conference. Bangkok, 2009: 475
      [25] Huang G B, Zhou H M, Ding X J, et al. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern, 2012, 42(2): 513 doi: 10.1109/TSMCB.2011.2168604
      [26] Gautam C, Tiwari A, Leng Q. On the construction of extreme learning machine for online and offline one-class classification-an expanded toolbox. Neurocomputing, 2017, 261: 126 doi: 10.1016/j.neucom.2016.04.070
      [27] Zhu M, Liu Q, Liu X, et al. Fault detection method for avionics based on LMK and OC-ELM. Syst Eng Electron, 2020, 42(6): 1424 doi: 10.3969/j.issn.1001-506X.2020.06.29

      朱敏, 劉奇, 劉星, 等. 基于LMK和OC-ELM的航空電子部件故障檢測方法. 系統工程與電子技術, 2020, 42(6):1424 doi: 10.3969/j.issn.1001-506X.2020.06.29
      [28] Xue L X, Qiu B Z. Boundary points detection algorithm based on coefficient of variation. Pattern Recognit Artif Intell, 2009, 22(5): 799 doi: 10.3969/j.issn.1003-6059.2009.05.020

      薛麗香, 邱保志. 基于變異系數的邊界點檢測算法. 模式識別與人工智能, 2009, 22(5):799 doi: 10.3969/j.issn.1003-6059.2009.05.020
      [29] Zhang Z, Duan Z M, Long Y. Fault detection in switched current circuits based on preferred wavelet packet. Chin J Eng, 2017, 39(7): 1101

      張鎮, 段哲民, 龍英. 基于小波包的開關電流電路故障診斷. 工程科學學報, 2017, 39(7):1101
    • 加載中
    圖(5) / 表(4)
    計量
    • 文章訪問數:  1602
    • HTML全文瀏覽量:  752
    • PDF下載量:  66
    • 被引次數: 0
    出版歷程
    • 收稿日期:  2020-04-05
    • 刊出日期:  2021-06-25

    目錄

      /

      返回文章
      返回