• 《工程索引》(EI)刊源期刊
    • 中文核心期刊
    • 中國科技論文統計源期刊
    • 中國科學引文數據庫來源期刊

    留言板

    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

    姓名
    郵箱
    手機號碼
    標題
    留言內容
    驗證碼

    文本生成領域的深度強化學習研究進展

    徐聰 李擎 張德政 陳鵬 崔家瑞

    徐聰, 李擎, 張德政, 陳鵬, 崔家瑞. 文本生成領域的深度強化學習研究進展[J]. 工程科學學報, 2020, 42(4): 399-411. doi: 10.13374/j.issn2095-9389.2019.06.16.030
    引用本文: 徐聰, 李擎, 張德政, 陳鵬, 崔家瑞. 文本生成領域的深度強化學習研究進展[J]. 工程科學學報, 2020, 42(4): 399-411. doi: 10.13374/j.issn2095-9389.2019.06.16.030
    XU Cong, LI Qing, ZHANG De-zheng, CHEN Peng, CUI Jia-rui. Research progress of deep reinforcement learning applied to text generation[J]. Chinese Journal of Engineering, 2020, 42(4): 399-411. doi: 10.13374/j.issn2095-9389.2019.06.16.030
    Citation: XU Cong, LI Qing, ZHANG De-zheng, CHEN Peng, CUI Jia-rui. Research progress of deep reinforcement learning applied to text generation[J]. Chinese Journal of Engineering, 2020, 42(4): 399-411. doi: 10.13374/j.issn2095-9389.2019.06.16.030

    文本生成領域的深度強化學習研究進展

    doi: 10.13374/j.issn2095-9389.2019.06.16.030
    基金項目: 國家重點研發計劃云計算和大數據專項資助項目(2017YFB1002304)
    詳細信息
      通訊作者:

      E-mail:liqing@ies.ustb.edu.cn

    • 中圖分類號: TP183

    Research progress of deep reinforcement learning applied to text generation

    More Information
    • 摘要: 谷歌的人工智能系統(AlphaGo)在圍棋領域取得了一系列成功,使得深度強化學習得到越來越多的關注。深度強化學習融合了深度學習對復雜環境的感知能力和強化學習對復雜情景的決策能力。而自然語言處理過程中有著數量巨大的詞匯或者語句需要表征,并且在對話系統、機器翻譯和圖像描述等文本生成任務中存在大量難以建模的決策問題。這使得深度強化學習在自然語言處理的文本生成任務中能夠發揮重要的作用,幫助改進現有的模型結構或者訓練機制,并且已經取得了很多顯著的成果。為此,本文系統闡述深度強化學習應用在不同的文本生成任務中的一些主要方法,梳理其發展的軌跡,分析算法特點。最后,展望深度強化學習與自然語言處理任務融合的前景和挑戰。

       

    • 圖  1  深度強化學習的基本框架

      Figure  1.  Framework of deep reinforcement learning

      圖  2  深度Q網絡的訓練流程

      Figure  2.  Training process of deep Q-network

      圖  3  動作者?評價者框架的訓練流程圖

      Figure  3.  Training process of the actor?critic framework

      圖  4  序列生成對抗網絡模型結構及其訓練過程

      Figure  4.  Structure and training process of the seqGANs model

      表  1  對話數據集內容概覽

      Table  1.   Summary of dialogue datasets

      DatasetNumbers of dialogueNumbers of slotsSceneMulti-turn
      Cambridge restaurants database72061Yes
      San Francisco restaurants database3577121Yes
      Dialog system technology challenge 2300081Yes
      Dialog system technology challenge 3226591Yes
      Stanford multi-turn multi-domain task-oriented dialogue dataset303179,65,1403Yes
      The Twitter dialogue corpus1300000Yes
      The Ubuntu dialogue corpus932429No
      Opensubtitle corpus70000000No
      下載: 導出CSV
      中文字幕在线观看
    • [1] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd Ed. Massachusetts: MIT Press, 2018
      [2] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529 doi: 10.1038/nature14236
      [3] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484 doi: 10.1038/nature16961
      [4] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436 doi: 10.1038/nature14539
      [5] Littman M L. Reinforcement learning improves behaviour from evaluative feedback. Nature, 2015, 521(7553): 445 doi: 10.1038/nature14540
      [6] Li Y X. Deep reinforcement learning: an overview[J/OL]. arXiv Preprint (2017-09-15) [2019-06-16]. https://arxiv.org/abs/1701.07274
      [7] Baroni M, Zamparelli R. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, 2010: 1183
      [8] Lapata M, Mitchell J. Vector-based models of semantic composition // Proceedings of the Meeting of the Association for Computational Linguistics. Columbus, 2008: 236
      [9] Su P H, Ga?i? M, Mrk?i? N, et al. On-line active reward learning for policy optimisation in spoken dialogue systems // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 2431
      [10] Vinyals O, Le Q. A neural conversational model[J/OL]. arXiv Preprint (2015-07-22) [2019-06-16]. https://arxiv.org/abs/1506.05869
      [11] Wen T H, Vandyke D, Mrksic N, et al. A network-based end-to-end trainable task-oriented dialogue system[J/OL]. arXiv Preprint (2017-04-24) [2019-06-16]. https://arxiv.org/abs/1604.04562
      [12] Wen T H, Ga?ic M, Kim D, et al. Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 275
      [13] Henderson M, Thomson B, Williams J. The second dialog state tracking challenge // Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Philadelphia, 2014: 263
      [14] Eric M, Manning C D. Key-value retrieval networks for task-oriented dialogue[J/OL]. arXiv Preprint (2017-07-14) [2019-06-16]. https://arxiv.org/abs/1705.05414
      [15] Lowe R, Pow N, Serban I V, et al. The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 285
      [16] Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: Parameter estimation. Comput Linguist, 1993, 19(2): 263
      [17] Koehn P, Och F J, Marcu D. Statistical phrase-based translation // Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, 2003: 48
      [18] Zhang J J, Zong C Q. Deep neural networks in machine translation: an overview. IEEE Intell Sys, 2015, 30(5): 16 doi: 10.1109/MIS.2015.69
      [19] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks // Proceedings of Advances in Neural Information Processing Systems. Montréal, 2014: 3104
      [20] Cho K, Merri?nboer van B, Bahdanau D, et al. On the properties of neural machine translation: encoder–decoder approaches. Comput Sci, 2014: 103
      [21] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1412
      [22] Wu Y H, Schuster M, Chen Z F, et al. Google’s neural machine translation system: bridging the gap between human and machine translation[J/OL]. arXiv Preprint (2016-10-08) [2019-06-16]. https://arxiv.org/abs/1609.08144
      [23] He Z J. Baidu translate: research and products // Proceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra). Beijing, 2015: 61
      [24] Cho K, Merrienboer van B, Gulcehre C, et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1724
      [25] Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention // Proceedings of 32nd International Conference on Machine Learning. Lille, 2015: 2048
      [26] Das A, Kottur S, Gupta K, et al. Visual dialog[J/OL]. arXiv Preprint (2017-08-01) [2019-06-16]. https://arxiv.org/abs/1611.08669
      [27] Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res, 2013, 47: 853 doi: 10.1613/jair.3994
      [28] Young P, Lai A, Hodosh M, et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist, 2014, 2: 67
      [29] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context // Proceedings of European Conference on Computer Vision. Zurich, 2014: 740
      [30] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning // AAAI Conference on Artificial Intelligence. Phoenix, 2016: 2094
      [31] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J/OL]. arXiv Preprint (2016-02-25) [2019-06-16]. https://arxiv.org/abs/1511.05952
      [32] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1995
      [33] Schulman J, Levine S, Mortiz P, et al. Trust region policy optimization // Proceedings of 31st International Conference on Machine Learning. Lille, 2015: 1889
      [34] Kandasamy K, Bachrach Y, Tomioka R, et al. Batch policy gradient methods for improving neural conversation models[J/OL]. arXiv preprint (2017-02-10) [2019-06-16]. https://arxiv.org/abs/1702.03334
      [35] Bhatnagar S, Sutton R S, Ghavamzadeh M, et al. Natural actor-critic algorithms. Automatica, 2009, 45(11): 2471 doi: 10.1016/j.automatica.2009.07.008
      [36] Grondman I, Busoniu L, Lopes G A D, et al. A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C Appl Rev, 2012, 42(6): 1291 doi: 10.1109/TSMCC.2012.2218595
      [37] Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928
      [38] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv Preprint (2016-02-29) [2019-06-16]. https://arxiv.org/abs/1509.02971
      [39] Kulkarni T D, Saeedi A, Gautam S, et al. Deep successor reinforcement learning[J/OL]. arXiv Preprint (2016-06-08) [2019-06-16]. https://arxiv.org/abs/1606.02396
      [40] Xu C, Li Q, Zhang D, et al. Deep successor feature learning for text generation[J/OL]. Neurocomputing, (2019-04-25) [2019-06-16]. https://doi.org/10.1016/j.neucom.2018.11.116
      [41] Zhang J W, Springenberg J T, Boedecker J, et al. Deep reinforcement learning with successor features for navigation across similar environments[J/OL]. arXiv Preprint (2017-07-23) [2019-06-16]. https://arxiv.org/abs/1612.05533
      [42] Bowling M, Burch N, Johanson M, et al. Heads-up limit hold’em poker is solved. Science, 2015, 347(6218): 145 doi: 10.1126/science.1259433
      [43] Liu X, Xia T, Wang J, et al. Fully convolutional attention localization networks for fine-grained recognition[J/OL]. arXiv Preprint (2017-03-21) [2019-06-16]. https://arxiv.org/abs/1603.06765
      [44] Zoph B, Le Q V. Neural architecture search with reinforcement learning[J/OL]. arXiv Preprint (2017-02-15) [2019-06-16]. https://arxiv.org/abs/1611.01578
      [45] Theocharous G, Thomas P S, Ghavamzadeh M. Personalized ad recommendation systems for life-time value optimization with guarantees // International Joint Conferences on Artificial Intelligence. Buenos Aires, 2015: 1806
      [46] Cuayáhuitl H. Simple D S: A simple deep reinforcement learning dialogue system // Dialogues with Social Robots. Springer, Singapore, 2017: 109
      [47] He D, Xia Y C, Qin T, et al. Dual learning for machine translation // Advances in Neural Information Processing Systems. Barcelona, 2016: 820
      [48] Zhang X X, Lapata M. Sentence simplification with deep reinforcement learning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 2017: 584
      [49] Narasimhan K, Kulkarni T D, Barzilay R. Language understanding for text-based games using deep reinforcement learning // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1001
      [50] Williams R J, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput, 1989, 1(2): 270 doi: 10.1162/neco.1989.1.2.270
      [51] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9(8): 1735 doi: 10.1162/neco.1997.9.8.1735
      [52] He J, Chen J, He X, et al. Deep reinforcement learning with a natural language action space // Proceedings of 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 1621
      [53] Guo H. Generating text with deep reinforcement learning[J/OL]. arXiv Preprint (2015-10-30) [2019-06-16]. https://arxiv.org/abs/1510.09202
      [54] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation // Proceedings of 40th Annual Meeting of Association for Computational Linguistics. Philadelphia, 2002: 311
      [55] Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation // Advances in Neural Information Processing Systems. Denver, 2000: 1057
      [56] Ranzato M A, Chopra S, Auli M, et al. Sequence level training with recurrent neural networks[J/OL]. arXiv Preprint (2016-05-06) [2019-06-16]. https://arxiv.org/abs/1511.06732
      [57] Li J W, Monroe W, Shi T L, et al. Adversarial learning for neural dialogue generation[J/OL]. arXiv Preprint (2017-09-24) [2019-06-16]. https://arxiv.org/abs/1701.06547
      [58] Lin C Y. Rouge: A package for automatic evaluation of summaries // Proceedings of Workshop on Text Summarization Branches Out, Post Conference Workshop of ACL 2004. Barcelona, 2004: 8
      [59] Rennie S J, Marcheret E, Mroueh Y, et al. Self-critical sequence training for image captioning[J/OL]. arXiv Preprint (2017-11-16) [2019-06-16]. https://arxiv.org/abs/1612.00563
      [60] Vedantam R, Lawrence Z C, Parikh D. CIDEr: Consensus-based image description evaluation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, 2015: 4566
      [61] Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, 2005: 65
      [62] Wang L, Yao J L, Tao Y Z, et al. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization // Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, 2018: 4453
      [63] Wu Y X, Hu B T. Learning to extract coherent summary via deep reinforcement learning // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 5602
      [64] Li J W, Monroe W, Ritter A, et al. Deep reinforcement learning for dialogue generation[J/OL]. arXiv Preprint (2016-09-29) [2019-06-16]. https://arxiv.org/abs/1606.01541
      [65] Takanobu R, Huang M, Zhao Z Z, et al. A weakly supervised method for topic segmentation and labeling in goal-oriented dialogues via reinforcement learning // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, 2018: 4403
      [66] Bahdanau D, Brakel P, Xu K, et al. An actor-critic algorithm for sequence prediction[J/OL]. arXiv Preprint (2017-03-03) [2019-06-16]. https://arxiv.org/abs/1607.07086
      [67] Su P H, Budzianowski P, Ultes S, et al. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management[J/OL]. arXiv Preprint (2017-07-05) [2019-06-16]. https://arxiv.org/abs/1707.00130
      [68] Wang Z Y, Bapst V, Heess N, et al. Sample efficient actor-critic with experience replay[J/OL]. arXiv Preprint (2017-07-10) [2019-06-16]. https://arxiv.org/abs/1611.01224
      [69] Peters J, Schaal S. Natural actor-critic. Neurocomputing, 2008, 71(7-9): 1180 doi: 10.1016/j.neucom.2007.11.026
      [70] Chen L, Su P H, Gasic M. Hyper-parameter optimisation of gaussian process reinforcement learning for statistical dialogue management // Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 407
      [71] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets // Advances in Neural Information Processing Systems. Montréal, 2014: 1
      [72] Yu L T, Zhang W N, Wang J, et al. SeqGAN: Sequence generative adversarial nets with policy gradient // Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto, 2017: 2852
      [73] Pfau D, Vinyals O. Connecting generative adversarial networks and actor-critic methods[J/OL]. arXiv Preprint (2017-01-18) [2019-06-16]. https://arxiv.org/abs/1610.01945
      [74] Serban I V, Sankar C, Germain M, et al. A deep reinforcement learning chatbot[J/OL]. arXiv Preprint (2017-11-05) [2019-06-16]. https://arxiv.org/abs/1709.02349
      [75] He D, Lu H Q, Xia Y C, et al. Decoding with value networks for neural machine translation //Advances in Neural Information Processing Systems. Long Beach, 2017: 177
      [76] Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928
      [77] Casanueva I, Budzianowski P, Su P H, et al. Feudal reinforcement learning for dialogue management in large domains // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, 2018: 714
      [78] Dayan P, Hinton G E. Feudal reinforcement learning // Advances in Neural Information Processing Systems. Denver, 1993: 271
      [79] Xiong W, Hoang T, Wang W Y. DeepPath: a reinforcement learning method for knowledge graph reasoning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, 2017: 564
      [80] Buck C, Bulian J, Ciaramita M, et al. Ask the right questions: active question reformulation with reinforcement learning. arXiv Preprint (2018-03-02) [2019-06-16]. https://arxiv.org/abs/1705.07830
      [81] Feng J, Huang M L, Zhao L, et al. Reinforcement learning for relation classification from noisy data // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 5779
      [82] Zhang T Y, Huang M L, Zhao L. Learning structured representation for text classification via reinforcement learning // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 6053
    • 加載中
    圖(4) / 表(1)
    計量
    • 文章訪問數:  7191
    • HTML全文瀏覽量:  2069
    • PDF下載量:  308
    • 被引次數: 0
    出版歷程
    • 收稿日期:  2019-06-16
    • 刊出日期:  2020-04-01

    目錄

      /

      返回文章
      返回