A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators

doi:10.1109/TCAD.2022.3221906

	A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators
	Qu, Songyun 1,2; Li, Bing 3; Zhao, Shixin 1,2; Zhang, Lei 4; Wang, Ying 1,2,5
	2023-07-01
发表期刊	IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
ISSN	0278-0070
卷号	42 期号:7 页码:2364-2376
摘要	Network sparsity or pruning is a pivotal technology for edge intelligence. Resistive random access memory (RRAM)-based accelerators, featuring dense storage and processing in memory capability, have demonstrated the superior computing performance and energy efficiency over the traditional CMOS-based accelerators for neural network applications. Unfortunately, RRAM-based accelerators suffer the performance or energy degradation when deploying pruned models, impairing their competition in the edge intelligence scenarios. We observed the essential reason is the pruning technology and the mapping strategy in prior RRAM-based accelerator and are optimized individually. As a result, the random zeros in the pruned deep neural network are irregularly distributed in the crossbars, rendering the degradation of computation parallelism of the crossbar without crossbar demand reduction. In this work, we propose a coordinated model pruning and mapping framework to jointly optimize of model accuracy and efficiency of RRAM-based accelerators. As for the mapping, we first decouple weight matrices in the bit-wise way and map the bit matrices to different crossbars, where the signed weights are represented with the two's complement so as that save half desired crossbars. As for the pruning, we prune weight bits at the crossbar granularity so that free the crossbars holding the pruned bits. Furthermore, we employ an reinforcement learning (RL) approach to automatically select the optimal crossbar-aware bit-pruning strategy for any given neural network without laborious human efforts. We conducted the experiments on a set of representative neural networks and compared our framework with the state-of-the-art (SOTA) bit-sparsity works. The results show that automatic structured bit-pruning saves up to 89.64% energy reduction and 84.12% area overhead compared to existing PRIME-like architecture. Besides, our framework outperforms the SOTA bit-sparsity design by 1.5x in terms of the energy reduction on the RRAM-based accelerator.
关键词	AutoML bit-pruning deep neural networks (DNNs) resistive random access memory (RRAM)
DOI	10.1109/TCAD.2022.3221906
收录类别	SCI
语种	英语
资助项目	National Key Research and Development Program[2018AAA0102505] ; National Natural Science Foundation of China[62090024] ; National Natural Science Foundation of China[62222411] ; National Natural Science Foundation of China[61874124] ; National Natural Science Foundation of China[62204164] ; Zhejiang Lab[2021PC0AC01]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic
WOS记录号	WOS:001017411600023
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计	被引频次：3[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/21276
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Wang, Ying
作者单位	1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100190, Peoples R China 3.Capital Normal Univ, Acad Multidisciplinary Studies, Beijing 100037, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Chinese Acad Sci, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Qu, Songyun,Li, Bing,Zhao, Shixin,et al. A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2023,42(7):2364-2376.
APA	Qu, Songyun,Li, Bing,Zhao, Shixin,Zhang, Lei,&Wang, Ying.(2023).A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,42(7),2364-2376.
MLA	Qu, Songyun,et al."A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 42.7(2023):2364-2376.