CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment
Song, Ying1,2,3; Li, Mengxing1,2; Wang, Bo4
2023
发表期刊IEEE ACCESS
ISSN2169-3536
卷号11页码:49601-49610
摘要In the real world, large-scale image data sets usually present long-tailed distribution. When traditional visual recognition methods are applied to long-tail image data sets, problems such as model failure and sudden decline in recognition accuracy occur. While, when deep learning models encounter long-tailed datasets, they tend to perform poorly. In order to mitigate the impact of these problems, we propose CWTA (Long-tailed Visual Recognition via improved Cross-Window Self-Attention and TrivialAugment). CWTA uses CNN to better capture the local features of the image, uses the Cross-Window Self-Attention mechanism to dynamically adjust the perception domain to better deal with image noise, and uses TrivialAugment to enhance the diversity of a few types of data samples, thus improving the recognition accuracy of long-tailed distributed images. The experimental results show that the proposed CWTA performs best in the classification accuracy of different categories on different long-tailed datasets. We also compared CWTA with other long-tailed recognition algorithms (such as OLTR, LWS, ResLT, PaCo, and BALLAD), and the CWTA is the best when ResNet-50 as the Backbone. On the CIFAR100-LT, ImageNet-LT, and Places-LT datasets, the acc of all categories of CWTA is 12.9%, 0.4%, and 1.3% higher than that of BALLAD, respectively. For F-1-Score on CIFAR100-LT, ImageNet-LT, and Places-LT datasets, CWTA is 6.6%, 2.2%, and 1.5% higher than BALLAD, respectively.
关键词Convolutional neural networks Continuous wavelet transforms Data models Training Computer vision Transformers Transfer learning Long-tailed recognition self-attention vision transformer CNN TrivialAugment
DOI10.1109/ACCESS.2023.3277204
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China[61872043] ; State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS)[CARCHA202103]
WOS研究方向Computer Science ; Engineering ; Telecommunications
WOS类目Computer Science, Information Systems ; Engineering, Electrical & Electronic ; Telecommunications
WOS记录号WOS:001005708800001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/21194
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Song, Ying
作者单位1.Beijing Informat Sci & Technol Univ, Beijing Key Lab Internet Culture & Digital Dissemi, Beijing 100101, Peoples R China
2.Beijing Informat Sci & Technol Univ, Beijing Adv Innovat Ctr Mat Genome Engn, Beijing 100101, Peoples R China
3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100086, Peoples R China
4.Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450002, Peoples R China
推荐引用方式
GB/T 7714
Song, Ying,Li, Mengxing,Wang, Bo. Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment[J]. IEEE ACCESS,2023,11:49601-49610.
APA Song, Ying,Li, Mengxing,&Wang, Bo.(2023).Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment.IEEE ACCESS,11,49601-49610.
MLA Song, Ying,et al."Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment".IEEE ACCESS 11(2023):49601-49610.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Song, Ying]的文章
[Li, Mengxing]的文章
[Wang, Bo]的文章
百度学术
百度学术中相似的文章
[Song, Ying]的文章
[Li, Mengxing]的文章
[Wang, Bo]的文章
必应学术
必应学术中相似的文章
[Song, Ying]的文章
[Li, Mengxing]的文章
[Wang, Bo]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。