CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction
Zhang, Yanchao1,2; Min, Weiqing2,3; Nie, Liqiang1; Jiang, Shuqiang2,3
2021
发表期刊IEEE TRANSACTIONS ON MULTIMEDIA
ISSN1520-9210
卷号23页码:2917-2929
摘要Video venue category prediction has been drawing more attention in the multimedia community for various applications such as personalized location recommendation and video verification. Most of existing works resort to the information from either multiple modalities or other platforms for strengthening video representations. However, noisy acoustic information, sparse textual descriptions and incompatible cross-platform data could limit the performance gain and reduce the universality of the model. Therefore, we focus on discriminative visual feature extraction from videos by introducing a hybrid-attention structure. Particularly, we propose a novel Global-Local Attention Module (GLAM), which can be inserted to neural networks to generate enhanced visual features from video content. In GLAM, the Global Attention (GA) is used to catch contextual scene-oriented information via assigning channels with various weights while the Local Attention (LA) is employed to learn salient object-oriented features via allocating different weights for spatial regions. Moreover, GLAM can be extended to ones with multiple GAs and LAs for further visual enhancement. These two types of features respectively captured by GAs and LAs are integrated via convolution layers, and then delivered into convolutional Long Short-Term Memory (convLSTM) to generate spatial-temporal representations, constituting the content stream. In addition, video motions are explored to learn long-term movement variations, which also contributes to video venue prediction. The content and motion stream constitute our proposed Hybrid-Attention Enhanced Two-Stream Fusion Network (HA-TSFN). HA-TSFN finally merges the features from two streams for comprehensive representations. Extensive experiments demonstrate that our method achieves the state-of-the-art performance in the large-scale dataset Vine. The visualization also shows that the proposed GLAM can capture complementary scene-oriented and object-oriented visual features from videos. Our code is available at: https://github.com/zhangyanchao1014/HA-TSFN.
关键词Visualization Feature extraction Convolution Streaming media Object oriented modeling Three-dimensional displays Neural networks Feature extraction knowledge representation supervised learning video signal processing
DOI10.1109/TMM.2020.3019714
收录类别SCI
语种英语
资助项目Shandong Provincial Key Research and Development Program[2019JZZY010118] ; National Natural Science Foundation of China[61972378] ; National Natural Science Foundation of China[61532018] ; National Natural Science Foundation of China[U1936203] ; National Natural Science Foundation of China[U19B2040] ; Shandong Provincial Natural Science Foundation[ZR2019JQ23] ; Innovation Teams in Colleges and Universities in Jinan[2018GXRC014]
WOS研究方向Computer Science ; Telecommunications
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号WOS:000688215600030
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
被引频次:2[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/17101
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Min, Weiqing
作者单位1.Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
3.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Yanchao,Min, Weiqing,Nie, Liqiang,et al. Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2021,23:2917-2929.
APA Zhang, Yanchao,Min, Weiqing,Nie, Liqiang,&Jiang, Shuqiang.(2021).Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction.IEEE TRANSACTIONS ON MULTIMEDIA,23,2917-2929.
MLA Zhang, Yanchao,et al."Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction".IEEE TRANSACTIONS ON MULTIMEDIA 23(2021):2917-2929.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Yanchao]的文章
[Min, Weiqing]的文章
[Nie, Liqiang]的文章
百度学术
百度学术中相似的文章
[Zhang, Yanchao]的文章
[Min, Weiqing]的文章
[Nie, Liqiang]的文章
必应学术
必应学术中相似的文章
[Zhang, Yanchao]的文章
[Min, Weiqing]的文章
[Nie, Liqiang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。