Context Disentangling and Prototype Inheriting for Robust Visual Grounding

doi:10.1109/TPAMI.2023.3339628

	Context Disentangling and Prototype Inheriting for Robust Visual Grounding
	Tang, Wei 1; Li, Liang 2; Liu, Xuejing 3; Jin, Lu 1; Tang, Jinhui 1; Li, Zechao 1
	2024-05-01
发表期刊	IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
ISSN	0162-8828
卷号	46 期号:5 页码:3213-3229
摘要	Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios.
关键词	Visualization Grounding Prototypes Transformers Task analysis Linguistics Feature extraction Context disentangling open-vocabulary scene prototype discovering robust grounding visual grounding (VG)
DOI	10.1109/TPAMI.2023.3339628
收录类别	SCI
语种	英语
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号	WOS:001196751500059
出版者	IEEE COMPUTER SOC
引用统计	被引频次：24[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/38719
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Li, Zechao
作者单位	1.Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 3.SenseTime Res, Beijing 100084, Peoples R China
推荐引用方式 GB/T 7714	Tang, Wei,Li, Liang,Liu, Xuejing,et al. Context Disentangling and Prototype Inheriting for Robust Visual Grounding[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,46(5):3213-3229.
APA	Tang, Wei,Li, Liang,Liu, Xuejing,Jin, Lu,Tang, Jinhui,&Li, Zechao.(2024).Context Disentangling and Prototype Inheriting for Robust Visual Grounding.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,46(5),3213-3229.
MLA	Tang, Wei,et al."Context Disentangling and Prototype Inheriting for Robust Visual Grounding".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 46.5(2024):3213-3229.