Institute of Computing Technology, Chinese Academy IR
Context-Aware Proposal-Boundary Network With Structural Consistency for Audiovisual Event Localization | |
Wang, Hao1; Zha, Zheng-Jun1; Li, Liang2; Chen, Xuejin1; Luo, Jiebo3 | |
2023-07-19 | |
发表期刊 | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS |
ISSN | 2162-237X |
页码 | 11 |
摘要 | Audiovisual event localization aims to localize the event that is both visible and audible in a video. Previous works focus on segment-level audio and visual feature sequence encoding and neglect the event proposals and boundaries, which are crucial for this task. The event proposal features provide event internal consistency between several consecutive segments constructing one proposal, while the event boundary features offer event boundary consistency to make segments located at boundaries be aware of the event occurrence. In this article, we explore the proposal-level feature encoding and propose a novel context-aware proposal-boundary (CAPB) network to address audiovisual event localization. In particular, we design a local-global context encoder (LGCE) to aggregate local-global temporal context information for visual sequence, audio sequence, event proposals, and event boundaries, respectively. The local context from temporally adjacent segments or proposals contributes to event discrimination, while the global context from the entire video provides semantic guidance of temporal relationship. Furthermore, we enhance the structural consistency between segments by exploiting the above-encoded proposal and boundary representations. CAPB leverages the context information and structural consistency to obtain context-aware event-consistent cross-modal representation for accurate event localization. Extensive experiments conducted on the audiovisual event (AVE) dataset show that our approach outperforms the state-of-the-art methods by clear margins in both supervised event localization and cross-modality localization. |
关键词 | Audiovisual learning context learning event localization |
DOI | 10.1109/TNNLS.2023.3290083 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2020AAA0105702] ; National Natural Science Foundation of China (NSFC)[62225207] ; National Natural Science Foundation of China (NSFC)[U19B2038] ; National Natural Science Foundation of China (NSFC)[62121002] ; Youth Innovation Promotion Association of CAS[2020108] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:001035833700001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/21292 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Zha, Zheng-Jun |
作者单位 | 1.Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100089, Peoples R China 3.Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA |
推荐引用方式 GB/T 7714 | Wang, Hao,Zha, Zheng-Jun,Li, Liang,et al. Context-Aware Proposal-Boundary Network With Structural Consistency for Audiovisual Event Localization[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,2023:11. |
APA | Wang, Hao,Zha, Zheng-Jun,Li, Liang,Chen, Xuejin,&Luo, Jiebo.(2023).Context-Aware Proposal-Boundary Network With Structural Consistency for Audiovisual Event Localization.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,11. |
MLA | Wang, Hao,et al."Context-Aware Proposal-Boundary Network With Structural Consistency for Audiovisual Event Localization".IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023):11. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论