General Purpose Deep Learning Accelerator Based on Bit Interleaving

doi:10.1109/TCAD.2023.3342728

	General Purpose Deep Learning Accelerator Based on Bit Interleaving
	Chang, Liang 1; Lu, Hang 2,3,4; Li, Chenglong 1; Zhao, Xin 1; Hu, Zhicheng 1; Zhou, Jun 1; Li, Xiaowei 2,3,4
	2024-05-01
发表期刊	IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
ISSN	0278-0070
卷号	43 期号:5 页码:1470-1483
摘要	Along with the rapid evolution of deep neural networks, the ever-increasing complexity imposes formidable computation intensity on the hardware accelerator. In this article, we propose a novel computing philosophy called "bit interleaving" and the associate accelerator couple called "Bitlet" and Bitlet-X to maximally exploit the bit-level sparsity. Apart from the existing bit-serial/parallel accelerators, Bitlet leverages the abundant "sparsity parallelism" in the parameters to enforce the inference acceleration. Bitlet is versatile by supporting diverse precisions on a single platform, including floating-point 32 and fixed-point from 1b to 24b . The versatility enables Bitlet feasible for both efficient inference and training. Besides, by updating the key compute engine in the accelerator, Bitlet-X could furthermore improve the peak power consumption and efficiency for the inference-only scenario, with competitive accuracy. Empirical studies on 12 domain-specific deep learning applications highlight the following results: 1) up to $81x /21x energy efficiency improvement for training/inference over recent high-performance GPUs; 2) up to 15x /8x higher speedup/efficiency over state-of-the-art fixed-point accelerators; 3) 1.5 mm(2) area and scalable power consumption from 570 mW (fp32) to 432 mW (16b) and 365 mW (8b) @28 -nm TSMC; 4) 1.3x improvement of the peak power efficiency for the Bitlet-X over Bitlet; and 5) highly configurable justified by the ablation and sensitivity studies.
关键词	Synchronization Parallel processing Computational modeling Training Pragmatics Power demand Hardware acceleration Accelerator bit-level sparsity deep neural network (DNN)
DOI	10.1109/TCAD.2023.3342728
收录类别	SCI
语种	英语
资助项目	National Natural Science Foundation of China
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic
WOS记录号	WOS:001225897600012
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计	被引频次：4[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/40063
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Lu, Hang
作者单位	1.Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China 3.Zhongguancun Lab, Beijing 100081, Peoples R China 4.Shanghai Innovat Ctr Processor Technol, Shanghai 200120, Peoples R China
推荐引用方式 GB/T 7714	Chang, Liang,Lu, Hang,Li, Chenglong,et al. General Purpose Deep Learning Accelerator Based on Bit Interleaving[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2024,43(5):1470-1483.
APA	Chang, Liang.,Lu, Hang.,Li, Chenglong.,Zhao, Xin.,Hu, Zhicheng.,...&Li, Xiaowei.(2024).General Purpose Deep Learning Accelerator Based on Bit Interleaving.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,43(5),1470-1483.
MLA	Chang, Liang,et al."General Purpose Deep Learning Accelerator Based on Bit Interleaving".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 43.5(2024):1470-1483.