CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Dual-Stream Recurrent Neural Network for Video Captioning
Xu, Ning1; Liu, An-An1; Wong, Yongkang2; Zhang, Yongdong3,4; Nie, Weizhi1; Su, Yuting1; Kankanhalli, Mohan5
2019-08-01
发表期刊IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
ISSN1051-8215
卷号29期号:8页码:2482-2493
摘要Recent progress in using recurrent neural networks (RNNs) for video description has attracted an increasing interest, due to its capability to encode a sequence of frames for caption generation. While existing methods have studied various features (e.g., CNN, 3D CNN, and semantic attributes) for visual encoding, the representation and fusion of heterogeneous information from multi-modal spaces have not fully explored. Consider that different modalities are often asynchronous, frame-level multi-modal fusion (e.g., concatenation and linear fusion) will negatively influence each modality. In this paper, we propose a dual-stream RNN (DS-RNN) framework to jointly discover and integrate the hidden states of both visual and semantic streams for video caption generation. First, an encoding RNN is used for each stream to flexibly exploit the hidden states of respective modality. Specifically, we proposed an attentive multi-grained encoder module to enhance the local feature learning with global semantics feature. Then, a dual-stream decoder is deployed to integrate the asynchronous yet complementary sequential hidden states from both streams for caption generation. Extensive experiments on three benchmark datasets, namely, MSVD, MSR-VTT, and MPII-MD, show that DS-RNN achieves competitive performance against the state-of-the-art. Additional ablation studies were conducted on various variants of the proposed DS-RNN.
关键词Video captioning hidden state fusion dual stream recurrent neural network attention module
DOI10.1109/TCSVT.2018.2867286
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China[61772359] ; National Natural Science Foundation of China[61472275] ; National Natural Science Foundation of China[61525206] ; National Natural Science Foundation of China[61502337] ; National Key Research and Development Program of China[2017YFC0820600] ; National Defense Science and Technology Fund for Distinguished Young Scholars[2017-JCJQ-ZQ-022] ; National Research Foundation, Prime Minister's Office, Singapore, under its International Research Centre in Singapore Funding Initiative
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:000480310500022
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
被引频次:76[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/4435
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Liu, An-An
作者单位1.Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
2.Natl Univ Singapore, Smart Syst Inst, Singapore 119077, Singapore
3.Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Anhui, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
5.Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
推荐引用方式
GB/T 7714
Xu, Ning,Liu, An-An,Wong, Yongkang,et al. Dual-Stream Recurrent Neural Network for Video Captioning[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2019,29(8):2482-2493.
APA Xu, Ning.,Liu, An-An.,Wong, Yongkang.,Zhang, Yongdong.,Nie, Weizhi.,...&Kankanhalli, Mohan.(2019).Dual-Stream Recurrent Neural Network for Video Captioning.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,29(8),2482-2493.
MLA Xu, Ning,et al."Dual-Stream Recurrent Neural Network for Video Captioning".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 29.8(2019):2482-2493.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Xu, Ning]的文章
[Liu, An-An]的文章
[Wong, Yongkang]的文章
百度学术
百度学术中相似的文章
[Xu, Ning]的文章
[Liu, An-An]的文章
[Wong, Yongkang]的文章
必应学术
必应学术中相似的文章
[Xu, Ning]的文章
[Liu, An-An]的文章
[Wong, Yongkang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。