CSpace
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Tang, Wei1; Li, Liang2; Liu, Xuejing3; Jin, Lu1; Tang, Jinhui1; Li, Zechao1
2024-05-01
发表期刊IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
ISSN0162-8828
卷号46期号:5页码:3213-3229
摘要Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios.
关键词Visualization Grounding Prototypes Transformers Task analysis Linguistics Feature extraction Context disentangling open-vocabulary scene prototype discovering robust grounding visual grounding (VG)
DOI10.1109/TPAMI.2023.3339628
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号WOS:001196751500059
出版者IEEE COMPUTER SOC
引用统计
被引频次:2[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/38719
专题中国科学院计算技术研究所
通讯作者Li, Zechao
作者单位1.Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
3.SenseTime Res, Beijing 100084, Peoples R China
推荐引用方式
GB/T 7714
Tang, Wei,Li, Liang,Liu, Xuejing,et al. Context Disentangling and Prototype Inheriting for Robust Visual Grounding[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,46(5):3213-3229.
APA Tang, Wei,Li, Liang,Liu, Xuejing,Jin, Lu,Tang, Jinhui,&Li, Zechao.(2024).Context Disentangling and Prototype Inheriting for Robust Visual Grounding.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,46(5),3213-3229.
MLA Tang, Wei,et al."Context Disentangling and Prototype Inheriting for Robust Visual Grounding".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 46.5(2024):3213-3229.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Tang, Wei]的文章
[Li, Liang]的文章
[Liu, Xuejing]的文章
百度学术
百度学术中相似的文章
[Tang, Wei]的文章
[Li, Liang]的文章
[Liu, Xuejing]的文章
必应学术
必应学术中相似的文章
[Tang, Wei]的文章
[Li, Liang]的文章
[Liu, Xuejing]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。