Institute of Computing Technology, Chinese Academy IR
A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators | |
Qu, Songyun1,2; Li, Bing3; Zhao, Shixin1,2; Zhang, Lei4; Wang, Ying1,2,5 | |
2023-07-01 | |
发表期刊 | IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS |
ISSN | 0278-0070 |
卷号 | 42期号:7页码:2364-2376 |
摘要 | Network sparsity or pruning is a pivotal technology for edge intelligence. Resistive random access memory (RRAM)-based accelerators, featuring dense storage and processing in memory capability, have demonstrated the superior computing performance and energy efficiency over the traditional CMOS-based accelerators for neural network applications. Unfortunately, RRAM-based accelerators suffer the performance or energy degradation when deploying pruned models, impairing their competition in the edge intelligence scenarios. We observed the essential reason is the pruning technology and the mapping strategy in prior RRAM-based accelerator and are optimized individually. As a result, the random zeros in the pruned deep neural network are irregularly distributed in the crossbars, rendering the degradation of computation parallelism of the crossbar without crossbar demand reduction. In this work, we propose a coordinated model pruning and mapping framework to jointly optimize of model accuracy and efficiency of RRAM-based accelerators. As for the mapping, we first decouple weight matrices in the bit-wise way and map the bit matrices to different crossbars, where the signed weights are represented with the two's complement so as that save half desired crossbars. As for the pruning, we prune weight bits at the crossbar granularity so that free the crossbars holding the pruned bits. Furthermore, we employ an reinforcement learning (RL) approach to automatically select the optimal crossbar-aware bit-pruning strategy for any given neural network without laborious human efforts. We conducted the experiments on a set of representative neural networks and compared our framework with the state-of-the-art (SOTA) bit-sparsity works. The results show that automatic structured bit-pruning saves up to 89.64% energy reduction and 84.12% area overhead compared to existing PRIME-like architecture. Besides, our framework outperforms the SOTA bit-sparsity design by 1.5x in terms of the energy reduction on the RRAM-based accelerator. |
关键词 | AutoML bit-pruning deep neural networks (DNNs) resistive random access memory (RRAM) |
DOI | 10.1109/TCAD.2022.3221906 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program[2018AAA0102505] ; National Natural Science Foundation of China[62090024] ; National Natural Science Foundation of China[62222411] ; National Natural Science Foundation of China[61874124] ; National Natural Science Foundation of China[62204164] ; Zhejiang Lab[2021PC0AC01] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:001017411600023 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/21276 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Wang, Ying |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100190, Peoples R China 3.Capital Normal Univ, Acad Multidisciplinary Studies, Beijing 100037, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Chinese Acad Sci, State Key Lab Comp Architecture, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Qu, Songyun,Li, Bing,Zhao, Shixin,et al. A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2023,42(7):2364-2376. |
APA | Qu, Songyun,Li, Bing,Zhao, Shixin,Zhang, Lei,&Wang, Ying.(2023).A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,42(7),2364-2376. |
MLA | Qu, Songyun,et al."A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 42.7(2023):2364-2376. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论