CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge
Yang, Jiahao1,2; Li, Xiangyang1,2; Zheng, Mao3; Wang, Zihan1,2; Zhu, Yongqing1,2; Guo, Xiaoqian1,2; Yuan, Yuchen3; Chai, Zifeng3; Jiang, Shuqiang1,2
2023
发表期刊IEEE TRANSACTIONS ON IMAGE PROCESSING
ISSN1057-7149
卷号32页码:4073-4087
摘要Video-language pre-training has attracted considerable attention recently for its promising performance on various downstream tasks. Most existing methods utilize the modality-specific or modality-joint representation architectures for the cross-modality pre-training. Different from previous methods, this paper presents a novel architecture named Memory-augmented Inter-Modality Bridge (MemBridge), which uses the learnable intermediate modality representations as the bridge for the interaction between videos and language. Specifically, in the transformer-based cross-modality encoder, we introduce the learnable bridge tokens as the interaction approach, which means the video and language tokens can only perceive information from bridge tokens and themselves. Moreover, a memory bank is proposed to store abundant modality interaction information for adaptively generating bridge tokens according to different cases, enhancing the capacity and robustness of the inter-modality bridge. Through pre-training, MemBridge explicitly models the representations for more sufficient inter-modality interaction. Comprehensive experiments show that our approach achieves competitive performance with previous methods on various downstream tasks including video-text retrieval, video captioning, and video question answering on multiple datasets, demonstrating the effectiveness of the proposed method. The code has been available at https://github.com/jahhaoyang/MemBridge.
关键词Video-language pre-training inter-modality bridge memory module
DOI10.1109/TIP.2023.3283916
收录类别SCI
语种英语
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号WOS:001033515600013
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/21342
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Jiang, Shuqiang
作者单位1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.Tencent Co Ltd, Dept Machine Learning Platform, Beijing 100193, Peoples R China
推荐引用方式
GB/T 7714
Yang, Jiahao,Li, Xiangyang,Zheng, Mao,et al. MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2023,32:4073-4087.
APA Yang, Jiahao.,Li, Xiangyang.,Zheng, Mao.,Wang, Zihan.,Zhu, Yongqing.,...&Jiang, Shuqiang.(2023).MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge.IEEE TRANSACTIONS ON IMAGE PROCESSING,32,4073-4087.
MLA Yang, Jiahao,et al."MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge".IEEE TRANSACTIONS ON IMAGE PROCESSING 32(2023):4073-4087.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yang, Jiahao]的文章
[Li, Xiangyang]的文章
[Zheng, Mao]的文章
百度学术
百度学术中相似的文章
[Yang, Jiahao]的文章
[Li, Xiangyang]的文章
[Zheng, Mao]的文章
必应学术
必应学术中相似的文章
[Yang, Jiahao]的文章
[Li, Xiangyang]的文章
[Zheng, Mao]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。