Institute of Computing Technology, Chinese Academy IR
HEAT: Efficient Vision Transformer Accelerator With Hybrid-Precision Quantization | |
Zhao, Pan1; Xue, Donghui1; Wu, Licheng1; Chang, Liang1; Tan, Haining2; Han, Yinhe2; Zhou, Jun1 | |
2025-05-01 | |
发表期刊 | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS
![]() |
ISSN | 1549-7747 |
卷号 | 72期号:5页码:758-762 |
摘要 | Quantization is an important technique for the acceleration of transformer-based neural networks. Prior related works mainly consider quantization from the algorithm level. Their hardware implementation is inefficient. In this brief, we propose an efficient vision transformer accelerator with retraining-free and finetuning-free hybrid-precision quantization. At the algorithm level, the features and weights are divided into two parts: normal values and outlier values. These two parts are quantized with different bit widths and scaling factors. We use matrix transformation and group-wise quantization policy to improve hardware utilization. At the hardware level, we propose a two-stage FIFO group structure and a hierarchical interleaving data flow to further improve the utilization of the PE array. As a result, the input and weight matrices are quantized to 5.71 bits on average with 0.526 % accuracy loss on Swin-T. The accelerator achieves a frame rate of 118.9 FPS and an energy efficiency of 43.58 GOPS/W on the ZCU102 FPGA board, better than state-of-the-art works. |
关键词 | Vision transformer accelerator hybrid-precision quantization FPGA Vision transformer accelerator hybrid-precision quantization FPGA |
DOI | 10.1109/TCSII.2025.3547340 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Open Project Program of Anhui Province Key Laboratory of Spintronic Chip Research and Manufacturing[WNKFKT-25-01] ; National Science Foundation of China[62104025] ; State Key Laboratory of Computer Architecture (ICT, CAS)[CARCHB202117] |
WOS研究方向 | Engineering |
WOS类目 | Engineering, Electrical & Electronic |
WOS记录号 | WOS:001480532800020 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/40647 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Chang, Liang; Tan, Haining |
作者单位 | 1.Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Zhao, Pan,Xue, Donghui,Wu, Licheng,et al. HEAT: Efficient Vision Transformer Accelerator With Hybrid-Precision Quantization[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS,2025,72(5):758-762. |
APA | Zhao, Pan.,Xue, Donghui.,Wu, Licheng.,Chang, Liang.,Tan, Haining.,...&Zhou, Jun.(2025).HEAT: Efficient Vision Transformer Accelerator With Hybrid-Precision Quantization.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS,72(5),758-762. |
MLA | Zhao, Pan,et al."HEAT: Efficient Vision Transformer Accelerator With Hybrid-Precision Quantization".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 72.5(2025):758-762. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论