CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
I(2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning
Tu, Yunbin1,2; Li, Liang3; Su, Li; Gao, Shengxiang1,2; Yan, Chenggang4; Zha, Zheng-Jun5; Yu, Zhengtao1,2; Huang, Qingming6,7,8
2022
发表期刊IEEE TRANSACTIONS ON IMAGE PROCESSING
ISSN1057-7149
卷号31页码:3565-3577
摘要TV show captioning aims to generate a linguistic sentence based on the video and its associated subtitle. Compared to purely video-based captioning, the subtitle can provide the captioning model with useful semantic clues such as actors' sentiments and intentions. However, the effective use of subtitle is also very challenging, because it is the pieces of scrappy information and has semantic gap with visual modality. To organize the scrappy information together and yield a powerful omni-representation for all the modalities, an efficient captioning model requires understanding video contents, subtitle semantics, and the relations in between. In this paper, we propose an Intra- and Inter-relation Embedding Transformer (I(2)Transformer), consisting of an Intra-relation Embedding Block (IAE) and an Inter-relation Embedding Block (IEE) under the framework of a Transformer. First, the IAE captures the intra-relation in each modality via constructing the learnable graphs. Then, IEE learns the cross attention gates, and selects useful information from each modality based on their inter-relations, so as to derive the omni-representation as the input to the Transformer. Experimental results on the public dataset show that the I(2)Transformer achieves the state-of-the-art performance. We also evaluate the effectiveness of the IAE and IEE on two other relevant tasks of video with text inputs, i.e., TV show retrieval and video-guided machine translation. The encouraging performance further validates that the IAE and IEE blocks have a good generalization ability. The code is available at https://github.com/tuyunbin/I2Transformer.
关键词Transformers Semantics Task analysis Visualization TV Electronic mail Graph neural networks TV Show captioning video and subtitle intra-relation embedding inter-relation embedding transformer
DOI10.1109/TIP.2022.3159472
收录类别SCI
语种英语
资助项目National Key Research and Development Plan[2018AAA0102000] ; National Key Research and Development Plan[2019QY1801] ; National Key Research and Development Plan[2019QY1802] ; National Key Research and Development Plan[2019QY1800] ; National Natural Science Foundation of China[61972186] ; National Natural Science Foundation of China[61732005] ; National Natural Science Foundation of China[U21B2027] ; Yunnan High-Tech Industry Development Project[201606] ; Yunnan Provincial Major Science and Technology Special Plan Projects[202103AA080015] ; Yunnan Provincial Major Science and Technology Special Plan Projects[202002AD080001-5] ; Yunnan Basic Research Project[202001AS070014] ; Reserve Talents for Academic and Technological Leaders in Yunnan Province[202105AC160018] ; CAAI-Huawei MindSpore Open Fund, Youth Innovation Promotion Association of Chinese Academy of Sciences[2020108] ; CCF-Baidu Open Fund[2021PP15002000]
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号WOS:000803395500001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
被引频次:9[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/19565
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, Liang; Gao, Shengxiang
作者单位1.Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
2.Kunming Univ Sci & Technol, Yunnan Prov Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China
3.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
4.Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
5.Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230052, Peoples R China
6.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
7.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
8.Peng Cheng Lab, Shenzhen 518057, Peoples R China
推荐引用方式
GB/T 7714
Tu, Yunbin,Li, Liang,Su, Li,et al. I(2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2022,31:3565-3577.
APA Tu, Yunbin.,Li, Liang.,Su, Li.,Gao, Shengxiang.,Yan, Chenggang.,...&Huang, Qingming.(2022).I(2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning.IEEE TRANSACTIONS ON IMAGE PROCESSING,31,3565-3577.
MLA Tu, Yunbin,et al."I(2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning".IEEE TRANSACTIONS ON IMAGE PROCESSING 31(2022):3565-3577.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Tu, Yunbin]的文章
[Li, Liang]的文章
[Su, Li]的文章
百度学术
百度学术中相似的文章
[Tu, Yunbin]的文章
[Li, Liang]的文章
[Su, Li]的文章
必应学术
必应学术中相似的文章
[Tu, Yunbin]的文章
[Li, Liang]的文章
[Su, Li]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。