avtmNet:Adaptive Visual-Text Merging Network for Image Captioning

doi:10.1016/j.compeleceng.2020.106630

	avtmNet:Adaptive Visual-Text Merging Network for Image Captioning
	Song, Heng 1,2,3; Zhu, Junwu 1; Jiang, Yi 1,4
	2020-06-01
发表期刊	COMPUTERS & ELECTRICAL ENGINEERING
ISSN	0045-7906
卷号	84 页码:12
摘要	Recently, researchers have made extensive research on the technology of automatically generating descriptions for an image. Various technologies for image captioning have been proposed, among which attention-based encoder-decoder framework achieved great success. Two different types of attention models are proposed to generate image captions respectively, i.e., model based visual attention that is good at describing details, and model based text attention that is good at comprehensive understanding. In order to integrate and make full use of visual information and text information to generate more accurate captions for images, in this paper, we firstly introduce a visual attention model to generate the visual information and a text attention model to form the text information respectively, and then propose an adaptive visual-text merging network(avtmNet). This merging network can effectively merge the visual information and text information, and automatically determine the proportion of both visual information and text information to generate the next caption word. Extensive experiments are performed on the datasets named COCO2014 and Flickr30K respectively, and show the effectiveness and superiority of our proposed approach. (C) 2020 Elsevier Ltd. All rights reserved.
关键词	Image captioning Computer Vision Natural Language Processing Attention Mechanism Neural networks
DOI	10.1016/j.compeleceng.2020.106630
收录类别	SCI
语种	英语
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic
WOS记录号	WOS:000579053300009
出版者	PERGAMON-ELSEVIER SCIENCE LTD
引用统计	被引频次：9[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/15736
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Jiang, Yi
作者单位	1.Yangzhou Univ, Inst Informat Engn, Yangzhou, Jiangsu, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China 3.Univ Chinese Acad Sci, Beijing, Peoples R China 4.Shanghai Jiao Tong Univ, State Key Lab Ocean Engn, Shanghai, Peoples R China
推荐引用方式 GB/T 7714	Song, Heng,Zhu, Junwu,Jiang, Yi. avtmNet:Adaptive Visual-Text Merging Network for Image Captioning[J]. COMPUTERS & ELECTRICAL ENGINEERING,2020,84:12.
APA	Song, Heng,Zhu, Junwu,&Jiang, Yi.(2020).avtmNet:Adaptive Visual-Text Merging Network for Image Captioning.COMPUTERS & ELECTRICAL ENGINEERING,84,12.
MLA	Song, Heng,et al."avtmNet:Adaptive Visual-Text Merging Network for Image Captioning".COMPUTERS & ELECTRICAL ENGINEERING 84(2020):12.