Institute of Computing Technology, Chinese Academy IR
| Dual-Alignment CLIP: Task-Specific Alignment of Text and Visual Features for Few-Shot Remote Sensing Scene Classification | |
| Deng, Dongmei; Yao, Ping | |
| 2025 | |
| 发表期刊 | IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
![]() |
| ISSN | 1939-1404 |
| 卷号 | 18页码:19260-19272 |
| 摘要 | Convolutional neural networks (CNNs) are widely adopted for remote sensing image scene classification. However, labeling of large annotated remote sensing datasets is costly and time consuming, which limits the applicability of CNNs for real-world. Inspired by human ability, few-shot image classification offers a promising solution by utilizing limited labeled data. Recently, contrastive vision-language pretraining (CLIP) has shown impressive few-shot image classification performance in downstream remote sensing tasks. However, existing CLIP-based methods still have two essential issues: 1) bias in text features; 2) unreliable similarity in image features. To address these issues, we design a multilevel image-text feature alignment (MITA) component to align the multimodal embeddings with visual-guided text features from instance, class, and random level, and an image-image feature alignment (IIA) component to reliably measure the similarity between images by remapping these visual features from image-text alignment embedding space to image-image alignment feature space. Besides, we build an adaptive knowledge fusion component to automatically fuse prior knowledge from pre-training model and task-specific new knowledge from MITA and IIA module. These components comprise the proposed dual-alignment CLIP (DA-CLIP) method and extensive experiments on 12 remote sensing datasets validate its effectiveness. |
| 关键词 | Remote sensing Scene classification Visualization Training Manuals Few shot learning Feature extraction Adaptation models Training data Streaming media Contrastive vision-language pretraining (CLIP) few-shot learning (FSL) image classification remote sensing |
| DOI | 10.1109/JSTARS.2025.3590590 |
| 收录类别 | SCI |
| 语种 | 英语 |
| 资助项目 | Strategic Priority Research Program of the Chinese Academy of Sciences[XDA19020400] ; National Key Research and Development Program[2022YFF0902403] |
| WOS研究方向 | Engineering ; Physical Geography ; Remote Sensing ; Imaging Science & Photographic Technology |
| WOS类目 | Engineering, Electrical & Electronic ; Geography, Physical ; Remote Sensing ; Imaging Science & Photographic Technology |
| WOS记录号 | WOS:001547298600002 |
| 出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/41773 |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Yao, Ping |
| 作者单位 | Chinese Acad Sci, Univ Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China |
| 推荐引用方式 GB/T 7714 | Deng, Dongmei,Yao, Ping. Dual-Alignment CLIP: Task-Specific Alignment of Text and Visual Features for Few-Shot Remote Sensing Scene Classification[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,2025,18:19260-19272. |
| APA | Deng, Dongmei,&Yao, Ping.(2025).Dual-Alignment CLIP: Task-Specific Alignment of Text and Visual Features for Few-Shot Remote Sensing Scene Classification.IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,18,19260-19272. |
| MLA | Deng, Dongmei,et al."Dual-Alignment CLIP: Task-Specific Alignment of Text and Visual Features for Few-Shot Remote Sensing Scene Classification".IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 18(2025):19260-19272. |
| 条目包含的文件 | 条目无相关文件。 | |||||
| 个性服务 |
| 推荐该条目 |
| 保存到收藏夹 |
| 查看访问统计 |
| 导出为Endnote文件 |
| 谷歌学术 |
| 谷歌学术中相似的文章 |
| [Deng, Dongmei]的文章 |
| [Yao, Ping]的文章 |
| 百度学术 |
| 百度学术中相似的文章 |
| [Deng, Dongmei]的文章 |
| [Yao, Ping]的文章 |
| 必应学术 |
| 必应学术中相似的文章 |
| [Deng, Dongmei]的文章 |
| [Yao, Ping]的文章 |
| 相关权益政策 |
| 暂无数据 |
| 收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论