CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning
Luo, Huaishao1; Ji, Lei2,3,4; Zhong, Ming5; Chen, Yang5; Lei, Wen5; Duan, Nan4; Li, Tianrui1
2022-10-07
发表期刊NEUROCOMPUTING
ISSN0925-2312
卷号508页码:293-304
摘要Video clip retrieval and captioning tasks play an essential role in multimodal research and are the fundamental research problem for multimodal understanding and generation. The CLIP (Contrastive LanguageImage Pre-training) model has demonstrated the power of visual concepts learning from web collected image-text datasets. In this paper, we propose a CLIP4Clip model to transfer the knowledge of the image-text pretrained CLIP model to video-text tasks in an end-to-end manner. Furthermore, we conduct several empirical studies including 1) Whether image feature is enough for video-text retrieval and captioning? 2) How a post-pretraining on a large-scale video-text dataset based on the CLIP affect the performance? 3) What is the practical mechanism to model temporal dependency between video frames? And 4) The Hyper-parameters sensitivity of the model. Extensive experimental results present that the CLIP4Clip model transferred from the CLIP can achieve SOTA results on various video-text datasets, including MSR-VTT, MSVD, LSMDC, and DiDeMo for multimodal understanding and generation tasks.(c) 2022 Elsevier B.V. All rights reserved.
关键词Video retrieval Video captioning CLIP
DOI10.1016/j.neucom.2022.07.028
收录类别SCI
语种英语
资助项目National Science Foundation of China[62176221] ; National Science Foundation of China[61876158] ; National Science Foundation of China[61806170]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000848021200006
出版者ELSEVIER
引用统计
被引频次:212[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/19441
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Luo, Huaishao; Ji, Lei
作者单位1.Southwest Jiaotong Univ, Chengdu, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
3.Univ Chinese Acad Sci, Beijing, Peoples R China
4.Microsoft Res Asia, Beijing, Peoples R China
5.Microsoft STCA, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Luo, Huaishao,Ji, Lei,Zhong, Ming,et al. CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning[J]. NEUROCOMPUTING,2022,508:293-304.
APA Luo, Huaishao.,Ji, Lei.,Zhong, Ming.,Chen, Yang.,Lei, Wen.,...&Li, Tianrui.(2022).CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning.NEUROCOMPUTING,508,293-304.
MLA Luo, Huaishao,et al."CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning".NEUROCOMPUTING 508(2022):293-304.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Luo, Huaishao]的文章
[Ji, Lei]的文章
[Zhong, Ming]的文章
百度学术
百度学术中相似的文章
[Luo, Huaishao]的文章
[Ji, Lei]的文章
[Zhong, Ming]的文章
必应学术
必应学术中相似的文章
[Luo, Huaishao]的文章
[Ji, Lei]的文章
[Zhong, Ming]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。