Institute of Computing Technology, Chinese Academy IR
A neural topic model with word vectors and entity vectors for short texts | |
Zhao, Xiaowei1; Wang, Deqing1; Zhao, Zhengyang1; Liu, Wei2; Lu, Chenwei1; Zhuang, Fuzhen3,4 | |
2021-03-01 | |
发表期刊 | INFORMATION PROCESSING & MANAGEMENT |
ISSN | 0306-4573 |
卷号 | 58期号:2页码:11 |
摘要 | Traditional topic models are widely used for semantic discovery from long texts. However, they usually fail to mine high-quality topics from short texts (e.g. tweets) due to the sparsity of features and the lack of word co-occurrence patterns. In this paper, we propose a Variational Auto-Encoder Topic Model (VAETM for short) by combining word vector representation and entity vector representation to address the above limitations. Specifically, we first learn embedding representations of each word and each entity by employing a large-scale external corpora and a large and manually edited knowledge graph, respectively. Then we integrated the embedding representations into the variational auto-encoder framework and propose an unsupervised model named VAETM to infer the latent representation of topic distributions. To further boost VAETM, we propose an improved supervised VAETM (SVAETM for short) by considering label information in training set to supervise the inference of latent representation of topic distributions and the generation of topics. Last, we propose KL-divergence-based inference algorithms to infer approximate posterior distribution for our two models. Extensive experiments on three common short text datasets demonstrate our proposed VAETM and SVAETM outperform various kinds of state-of-the-art models in terms of perplexity, NPMI, and accuracy. |
关键词 | Topic model Short text Variational auto-encoder Word embedding Entity embedding |
DOI | 10.1016/j.ipm.2020.102455 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key R&D Program of China[2019YFA0707204] ; National Natural Science Foundation of China[U1836206] |
WOS研究方向 | Computer Science ; Information Science & Library Science |
WOS类目 | Computer Science, Information Systems ; Information Science & Library Science |
WOS记录号 | WOS:000612229800005 |
出版者 | ELSEVIER SCI LTD |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/16198 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Zhuang, Fuzhen |
作者单位 | 1.Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China 2.Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing 100029, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, CAS, Beijing 100190, Peoples R China 4.Chinese Acad Sci, Xiamen Data Intelligence Acad ICT, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Zhao, Xiaowei,Wang, Deqing,Zhao, Zhengyang,et al. A neural topic model with word vectors and entity vectors for short texts[J]. INFORMATION PROCESSING & MANAGEMENT,2021,58(2):11. |
APA | Zhao, Xiaowei,Wang, Deqing,Zhao, Zhengyang,Liu, Wei,Lu, Chenwei,&Zhuang, Fuzhen.(2021).A neural topic model with word vectors and entity vectors for short texts.INFORMATION PROCESSING & MANAGEMENT,58(2),11. |
MLA | Zhao, Xiaowei,et al."A neural topic model with word vectors and entity vectors for short texts".INFORMATION PROCESSING & MANAGEMENT 58.2(2021):11. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论