Institute of Computing Technology, Chinese Academy IR
Grouping sentences as better language unit for extractive text summarization | |
Cao, Mengyun1,2; Zhuge, Hai1 | |
2020-08-01 | |
发表期刊 | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE |
ISSN | 0167-739X |
卷号 | 109页码:331-359 |
摘要 | Most existing methods for extractive text summarization aim to extract important sentences with statistical or linguistic techniques and concatenate these sentences as a summary. However, the extracted sentences are usually incoherent. The problem becomes worse when the source text and the summary are long and based on logical reasoning. The motivation of this paper is to answer the following two related questions: What is the best language unit for constructing a summary that is coherent and understandable? How is the extractive summarization process based on the language unit? Extracting larger language units such as a group of sentences or a paragraph is a natural way to improve the readability of summary as it is rational to assume that the original sentences within a larger language unit are coherent. This paper proposes a framework for group-based text summarization that clusters semantically related sentences into groups based on Semantic Link Network (SLN) and then ranks the groups and concatenates the top-ranked ones into a summary. A two-layer SLN model is used to generate and rank groups with semantic links including the is-part-of link, sequential link, similar-to link, and cause-effect link. The experimental results show that summaries composed by group or paragraph tend to contain more key words or phrases than summaries composed by sentences and summaries composed by groups contain more key words or phrases than those composed by paragraphs especially when the average length of source texts is from 7000 words to 17,000 words which is the usual length of scientific papers. Further, we compare seven clustering algorithms for generating groups and propose five strategies for generating groups with the four types of semantic links. (C) 2020 Elsevier B.V. All rights reserved. |
关键词 | Text summarization Semantic Link Network Clustering Natural language processing |
DOI | 10.1016/j.future.2020.03.046 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[61640212] ; National Natural Science Foundation of China[61876048] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Theory & Methods |
WOS记录号 | WOS:000536950900027 |
出版者 | ELSEVIER |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15268 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Zhuge, Hai |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China 3.Guangzhou Univ, Sch Comp Sci & Network Engn, Guangzhou, Peoples R China 4.Aston Univ, Sch Engn & Appl Sci, Birmingham, W Midlands, England |
推荐引用方式 GB/T 7714 | Cao, Mengyun,Zhuge, Hai. Grouping sentences as better language unit for extractive text summarization[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2020,109:331-359. |
APA | Cao, Mengyun,&Zhuge, Hai.(2020).Grouping sentences as better language unit for extractive text summarization.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,109,331-359. |
MLA | Cao, Mengyun,et al."Grouping sentences as better language unit for extractive text summarization".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 109(2020):331-359. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[Cao, Mengyun]的文章 |
[Zhuge, Hai]的文章 |
百度学术 |
百度学术中相似的文章 |
[Cao, Mengyun]的文章 |
[Zhuge, Hai]的文章 |
必应学术 |
必应学术中相似的文章 |
[Cao, Mengyun]的文章 |
[Zhuge, Hai]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论