皮盼资讯网移动版

皮盼资讯网 > 潮流时尚 >

互联网新闻标题生成方法研究(4)

图5是PG生成摘要的每个单词对应全文的attention分布,以及每个单词的P_gen概率。从图中可以看出,当生成摘要的第一个词Photographer时,模型将绝大部分注意力放在原文中的photographer、russian、photographer、nick这四个单词中并以0.189的P_gen概率生成了摘要的第一个词。这表明PG模型具备发现文本细节的能力,并能够理解文章内容之间的关联从而能够达到最佳的标题生成能力。

五、结论

本文切入当今互联网新闻领域文不对题,新闻快速获取和稿件审核困难的痛点。结合当下的自然语言技术发展现状,沿着文本摘要技术的发展脉络,介绍了文本摘要模型的两类基本方法论即基于图结构模型和基于深度学习的模型。并介绍了基于深度学习的文本摘要模型的几处比较新颖的改进。最后通过爬取新闻网站上的真实互联网新闻数据,进行了多模型的新闻标题生成对比实验。最后的实验结果表明基于pointer-generator模型的新闻标题生成结果最为理想,能够有效地应用到海量互联网新闻地标题生成应用中。同时,现有的文本摘要技术仍然存在一些不太完善的问题,如模型更易于生成长句,且重复现象仍然存在。

应用新闻标题生成方法生成的“有效标题”能够较为高效的减少人工审核稿件时间,然而,为了彻底解决“震惊体”现象排查繁琐的问题,“有效标题”与“震惊体标题”的对比检测必不可少,本文暂时着重讨论有效标题的生成方法,在后续工作中,将着重探讨对两种标题的检测部分。

参考文献

[1] Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of research and development 2, 2 (1958), 159–165.

[2] Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19, 1 (1993), 61–74.

[3] Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 202–209.

[4] Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 1606–1618.

[5] Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, and Chingiz A Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 12 (2011), 14514–14522.

[6] Rasim M Alguliev, Ramiz M Aliguliyev, and Nijat R Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40, 5 (2013), 1675–1689.

[7] Günes Erkan and Dragomir R Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR) 22, 1 (2004), 457–479

[8] Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19–25

[9] Leonhard Hennig, Winfried Umbrath, and Robert Wetzker. 2008. An ontologybased approach to text summarization. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, Vol. 3.IEEE, 291–294.

[10] Sunitha, C., Jaya, A., & Ganesh, D. A. (2016). A study on abstractive summarization techniques in indian languages. In Fourth international conference on recent trends in computer science and engineering (pp. 25–31).

[11] Barzilay, R., & McKeown, K. R. (2005). Sentence fusion for multidocument news summarization. Computational Linguistics, 31(3), 297–327.

(责任编辑:admin)