Institute of Computing Technology, Chinese Academy IR
Context Disentangling and Prototype Inheriting for Robust Visual Grounding | |
Tang, Wei1; Li, Liang2; Liu, Xuejing3; Jin, Lu1; Tang, Jinhui1; Li, Zechao1 | |
2024-05-01 | |
发表期刊 | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |
ISSN | 0162-8828 |
卷号 | 46期号:5页码:3213-3229 |
摘要 | Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios. |
关键词 | Visualization Grounding Prototypes Transformers Task analysis Linguistics Feature extraction Context disentangling open-vocabulary scene prototype discovering robust grounding visual grounding (VG) |
DOI | 10.1109/TPAMI.2023.3339628 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:001196751500059 |
出版者 | IEEE COMPUTER SOC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/38719 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Li, Zechao |
作者单位 | 1.Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 3.SenseTime Res, Beijing 100084, Peoples R China |
推荐引用方式 GB/T 7714 | Tang, Wei,Li, Liang,Liu, Xuejing,et al. Context Disentangling and Prototype Inheriting for Robust Visual Grounding[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,46(5):3213-3229. |
APA | Tang, Wei,Li, Liang,Liu, Xuejing,Jin, Lu,Tang, Jinhui,&Li, Zechao.(2024).Context Disentangling and Prototype Inheriting for Robust Visual Grounding.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,46(5),3213-3229. |
MLA | Tang, Wei,et al."Context Disentangling and Prototype Inheriting for Robust Visual Grounding".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 46.5(2024):3213-3229. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论