• Volume 41 Issue 9
    Sep.  2019
    Turn off MathJax
    Article Contents
    ZHENG Heng-yi, LIAO Cheng-lin, LI Tian-zhu. A topic detection method for network long text[J]. Chinese Journal of Engineering, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013
    Citation: ZHENG Heng-yi, LIAO Cheng-lin, LI Tian-zhu. A topic detection method for network long text[J]. Chinese Journal of Engineering, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013

    A topic detection method for network long text

    doi: 10.13374/j.issn2095-9389.2019.09.013
    More Information
    • Internet public opinion is an important source of people's views on social hotspots and national current affairs. Topic detection in network long text contributes toward the analysis of network public opinion. According to the results of topic detection, the policymaker can timely and reliably make scientific decisions. In general, topic detection can be divided into two steps, i.e., representation learning and topic discovery. However, common representation learning methods, such as state vector space model (VSM) and term frequency-inverse document frequency, often lead to the problems of high dimensionality, sparsity, and latent semantic loss, whereas traditional topic discovery methods depend heavily on the text input orders. To overcome these, a novel topic detection method was presented herein. First, Word2vec & latent Dirichlet allocation (LDA)-based methods for representation learning were proposed to avoid the problem of high-dimensional sparsity and neglect of latent semantics. Weighted fusion of the text feature word implicit topic extracted by LDA and the feature word vector of Word2vec mapping could not only perform dimensionality reduction but also completely represent text information. Furthermore, Single-Pass and hierarchical agglomerative clustering for topic discovery could be more robust for input orders. To evaluate the effectiveness and efficiency of the proposed method, extensive experiments were conducted on a real-world multi-source dataset, which was collected from university social platforms. The experimental results show that the proposed method outperforms other methods, such as VSM and Single-Pass, by improving the clustering accuracy by 10%-20%.

       

    • loading
    • [1]
      AlSumait L S. Online Topic Detection, Tracking, and Significance Ranking Using Generative Topic Models [Dissertation]. Fairfax: George Mason University, 2009
      [2]
      Allan J, Harding S, Fisher D, et al. Taking topic detection from evaluation to practice//Proceedings of the 38th Annual Hawaii International Conference on System Sciences. Big Island, 2005: 1
      [3]
      Allan J, Lavrenko V, Swan R. Explorations within topic tracking and detection//Topic Detection and Tracking. Boston: Springer, 2002: 197
      [4]
      Schultz J M, Liberman M Y. Towards a "Universal Dictionary" for multi-language information retrieval applications//Topic Detection and Tracking. Boston: Springer, 2002: 225
      [5]
      姜朋. 山東大學輿情分析系統的設計與實現[學位論文]. 濟南: 山東大學, 2015

      Jiang P. Design and Implementation of Public Opinion Analysis System of Shandong University [Dissertation]. Jinan: Shandong University, 2015
      [6]
      黃美璇. 基于主題發現的輿情分析系統的設計與實現. 北京聯合大學學報: 自然科學版, 2012, 26(1): 33 doi: 10.3969/j.issn.1005-0310.2012.01.009

      Huang M X. The design and the implementation of the public opinion analysis system based on subject discovery. J Beijing Union Univ Nat Sci, 2012, 26(1): 33 doi: 10.3969/j.issn.1005-0310.2012.01.009
      [7]
      任海果. 基于主題事件的輿情分析系統的設計與實現[學位論文]. 北京: 北京郵電大學, 2012

      Ren H G. The Design and Implementation of Public Opinion Analysis System Based on Topic Events [Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2012
      [8]
      吳利華. 基于論壇的話題發現與跟蹤算法研究[學位論文]. 北京: 北京郵電大學, 2013

      Wu L H. Forum Based Topic Detection and Tracking Algorithms Study on [Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2013
      [9]
      高雄. 基于論壇的輿情分析系統設計與實現[學位論文]. 哈爾濱: 哈爾濱工業大學, 2012

      Gao X. Designing and Building APublic Opinion Monitoring System Based on Forum Information [Dissertation]. Harbin: Harbin Institute of Technology, 2012
      [10]
      周炎濤, 唐劍波, 吳正國. 基于向量空間模型的多主題Web文本分類方法. 計算機應用研究, 2008, 25(1): 142 doi: 10.3969/j.issn.1001-3695.2008.01.043

      Zhou Y T, Tang J B, Wu Z G. Method of multi-topic Web text classification based on VSM. Appl Res Comput, 2008, 25(1): 142 doi: 10.3969/j.issn.1001-3695.2008.01.043
      [11]
      Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993
      [12]
      Huang B, Yang Y, Mahmood A, et al. Microblog topic detection based on LDA model and single-pass clustering//International Conference on Rough Sets and Current Trends in Computing. Berlin: Springer, 2012
      [13]
      胡秀麗. 基于VSM和LDA模型相結合的微博話題漂移檢測. 蘭州理工大學學報, 2015, 41(5): 104 doi: 10.3969/j.issn.1673-5196.2015.05.023

      Hu X L. Micro-blog topic drift detection based on VSM and LDA models. J Lanzhou Univ Technol, 2015, 41(5): 104 doi: 10.3969/j.issn.1673-5196.2015.05.023
      [14]
      王振振, 何明, 杜永萍. 基于LDA主題模型的文本相似度計算. 計算機科學, 2013, 40(12): 229 doi: 10.3969/j.issn.1002-137X.2013.12.049

      Wang Z Z, He M, Du Y P. Text similarity computing based on topic model LDA. Comput Sci, 2013, 40(12): 229 doi: 10.3969/j.issn.1002-137X.2013.12.049
      [15]
      Hinton G E. Learning distributed representations of concepts//Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Amherst, 1986: 1
      [16]
      唐明, 朱磊, 鄒顯春. 基于Word2Vec的一種文檔向量表示. 計算機科學, 2016, 43(6): 214 https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201606045.htm

      Tang M, Zhu L, Zou X C. Document vector representation based on Word2Vec. Comput Sci, 2016, 43(6): 214 https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201606045.htm
      [17]
      Zhang D, Li S D. Topic detection based on K-means//International Conference on Electronics, Communications and Control (ICECC). Ningbo, 2011: 2983
      [18]
      Meng Z Q, Shen S M, Chen Q L. A network decomposition-based text clustering algorithm for topic detection. Appl Mech Mater, 2013, 239-240: 1318 http://www.scientific.net/AMM.239-240.1318
      [19]
      Yi X L, Zhao X, Ke N, et al. An improved Single-Pass clustering algorithm internet-oriented network topic detection//Fourth International Conference on Intelligent Control and Information Processing (ICICIP). Beijing, 2013: 560
      [20]
      Huang S, Peng X P, Niu Z D, et al. News topic detection based on hierarchical clustering and named entity//7th International Conference on Natural Language Processing And Knowledge Engineering. Tokushima, 2011: 280
      [21]
      雷震, 吳玲達, 雷蕾, 等. 初始化類中心的增量K均值法及其在新聞事件探測中的應用. 情報學報, 2006, 25(3): 289

      Lei Z, Wu L D, Lei L, et al. Incremental K-means method based on initialisation of cluster centers and its application in news event detection. J Chin Soc Sci Tech Inf, 2006, 25(3): 289
    • 加載中

    Catalog

      通訊作者: 陳斌, bchen63@163.com
      • 1. 

        沈陽化工大學材料科學與工程學院 沈陽 110142

      1. 本站搜索
      2. 百度學術搜索
      3. 萬方數據庫搜索
      4. CNKI搜索

      Figures(7)  / Tables(1)

      Article views (978) PDF downloads(26) Cited by()
      Proportional views
      Related

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return
      中文字幕在线观看