Institute of Computing Technology, Chinese Academy IR
Modeling spatial and semantic cues for large-scale near-duplicated image retrieval | |
Zhang, Shiliang2; Tian, Qi1; Hua, Gang3; Zhou, Wengang4; Huang, Qingming5; Li, Houqiang4; Gao, Wen2 | |
2011-03-01 | |
发表期刊 | COMPUTER VISION AND IMAGE UNDERSTANDING |
ISSN | 1077-3142 |
卷号 | 115期号:3页码:403-414 |
摘要 | Bag-of-visual Words (BOW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary. i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency. (C) 2010 Elsevier Inc. All rights reserved. |
关键词 | Visual vocabulary Near-duplicated image retrieval Local feature Distance metric learning |
DOI | 10.1016/j.cviu.2010.11.003 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[61025011] ; National Natural Science Foundation of China[60833006] ; National Basic Research Program of China (973 Program)[2009CB320906] ; Beijing Natural Science Foundation[4092042] ; NSF IIS[1052851] ; Akiira Media Systems, Inc. |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000287772400011 |
出版者 | ACADEMIC PRESS INC ELSEVIER SCIENCE |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/12757 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Tian, Qi |
作者单位 | 1.Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA 2.CAS, Inst Comput Tech, Key Lab Intell Info Proc, Beijing 100190, Peoples R China 3.IBM Watson Res Ctr, Elmsford, NY 10523 USA 4.Univ Sci & Technol China, Dept EEIS, Hefei 230026, Peoples R China 5.Chinese Acad Sci, Grad Univ, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Shiliang,Tian, Qi,Hua, Gang,et al. Modeling spatial and semantic cues for large-scale near-duplicated image retrieval[J]. COMPUTER VISION AND IMAGE UNDERSTANDING,2011,115(3):403-414. |
APA | Zhang, Shiliang.,Tian, Qi.,Hua, Gang.,Zhou, Wengang.,Huang, Qingming.,...&Gao, Wen.(2011).Modeling spatial and semantic cues for large-scale near-duplicated image retrieval.COMPUTER VISION AND IMAGE UNDERSTANDING,115(3),403-414. |
MLA | Zhang, Shiliang,et al."Modeling spatial and semantic cues for large-scale near-duplicated image retrieval".COMPUTER VISION AND IMAGE UNDERSTANDING 115.3(2011):403-414. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论