CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Multimodal graph neural network for video procedural captioning
Ji, Lei1,2,3; Tu, Rongcheng4; Lin, Kevin5; Wang, Lijuan5; Duan, Nan3
2022-06-01
发表期刊NEUROCOMPUTING
ISSN0925-2312
卷号488页码:88-96
摘要Video procedural captioning aims to generate detailed descriptive captions for all steps in a long instructional video. The peculiarity of this problem is the procedural dependency between the events to generate consistent captions among the video. However, existing video (dense) captioning methods only consider intra-event or sequential inter-event context and are hard to model the non-sequential context dependency between events. In this paper, inspired by the recent success of graph neural networks in capturing the relations for structured data, we propose a novel Multimodal Graph Neural Network (MGNN) for dense video procedural captioning in capturing the procedural structure between events. Specifically, we construct temporal sequential graph and semantic non-sequential graph for a multi modal heterogeneous graph. Moreover, we adopt the graph neural network to enhance the visual and text features, and fuse both features for further caption generation. Extensive experiments demonstrate the proposed MGNN is effective in generating coherent captions on both the Youcook2 and Activitynet Captions benchmark.(c) 2022 Elsevier B.V. All rights reserved.
关键词Multimodal video captioning Graph neural network
DOI10.1016/j.neucom.2022.02.062
收录类别SCI
语种英语
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000782470900008
出版者ELSEVIER
引用统计
被引频次:5[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/18895
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Ji, Lei
作者单位1.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
2.Univ Chinese Acad Sci, Beijing, Peoples R China
3.Microsoft Res Asia, Beijing, Peoples R China
4.Beijing Inst Technol, Beijing, Peoples R China
5.Microsoft, Redmond, WA USA
推荐引用方式
GB/T 7714
Ji, Lei,Tu, Rongcheng,Lin, Kevin,et al. Multimodal graph neural network for video procedural captioning[J]. NEUROCOMPUTING,2022,488:88-96.
APA Ji, Lei,Tu, Rongcheng,Lin, Kevin,Wang, Lijuan,&Duan, Nan.(2022).Multimodal graph neural network for video procedural captioning.NEUROCOMPUTING,488,88-96.
MLA Ji, Lei,et al."Multimodal graph neural network for video procedural captioning".NEUROCOMPUTING 488(2022):88-96.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ji, Lei]的文章
[Tu, Rongcheng]的文章
[Lin, Kevin]的文章
百度学术
百度学术中相似的文章
[Ji, Lei]的文章
[Tu, Rongcheng]的文章
[Lin, Kevin]的文章
必应学术
必应学术中相似的文章
[Ji, Lei]的文章
[Tu, Rongcheng]的文章
[Lin, Kevin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。