An Automated Quantization Framework for High-Utilization RRAM-Based PIM

doi:10.1109/TCAD.2021.3061521

	An Automated Quantization Framework for High-Utilization RRAM-Based PIM
	Li, Bing 1; Qu, Songyun 2; Wang, Ying 3
	2022-03-01
发表期刊	IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
ISSN	0278-0070
卷号	41 期号:3 页码:583-596
摘要	With the advancement of deep neural networks (DNNs), the applications driven by DNNs have been spread from the cloud to the edge. However, the intensive computations and data movements in CNNs impede the adoption of DNNs in resource-constraint edge devices. Quantization, a common model compression method, has attracted a lot of attention as it enables efficient inference by lowering the data bit-width of CNN parameters. Due to the features of massive storage and computing-in-memory array, resistive memory (RRAM) has established the energy efficiency and small area processing-in-memory (PIM) for the acceleration of DNNs at the edge end. However, when deploying the network onto resistive-memory-based PIM (RRAM-based PIM), there will be tremendous unused cells due to the mismatch between the structure of the neural network layer and memory array, resulting in the resource under-utilization and low computation efficiency. In this work, we observed prior quantization approaches fail to improve hardware resource utilization as they ignored the hardware structure information in RRAM. Thus, combining the information of the neural network model and hardware information is essential for a high-utilization RRAM-based PIM design. Considering the vast model parameters and heterogeneous RRAM crossbar structure, we develop a novel quantization framework by leveraging the AutoML technique, i.e., RaQu, which automatically generates a fine-grained quantization strategy for any model that fully utilizes the resource of RRAM-based PIM. The experimental results show that RaQu achieves at most 29.2%-37.4% and 1.8%-3.3% improvement in resource utilization and model accuracy, respectively, compared to prior coarse-grained quantization methods.
关键词	Quantization (signal) Neural networks Computational modeling Data models Hardware Resource management Arrays AutoML neural network processing-in-memory (PIM) quantization resistive memory (RRAM)
DOI	10.1109/TCAD.2021.3061521
收录类别	SCI
语种	英语
资助项目	National Natural Science Foundation of China[61874124] ; Youth Innovation Promotion Association, CAS[2018138] ; State Key Laboratory of Computer Architecture[CARCH201913]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic
WOS记录号	WOS:000757852800018
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计	被引频次：8[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/18969
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Wang, Ying
作者单位	1.Capital Normal Univ, Acad Multidisciplinary Studies, Beijing 100037, Peoples R China 2.Univ Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Li, Bing,Qu, Songyun,Wang, Ying. An Automated Quantization Framework for High-Utilization RRAM-Based PIM[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2022,41(3):583-596.
APA	Li, Bing,Qu, Songyun,&Wang, Ying.(2022).An Automated Quantization Framework for High-Utilization RRAM-Based PIM.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,41(3),583-596.
MLA	Li, Bing,et al."An Automated Quantization Framework for High-Utilization RRAM-Based PIM".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 41.3(2022):583-596.