CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures

doi:10.1109/TC.2021.3099688

	CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures
	Zou, Kaiwei 1,2; Wang, Ying 1,2; Cheng, Long 3; Qu, Songyun 2,4; Li, Huawei 1,2,5; Li, Xiaowei 1,2
	2022-07-01
发表期刊	IEEE TRANSACTIONS ON COMPUTERS
ISSN	0018-9340
卷号	71 期号:7 页码:1626-1639
摘要	Real-time inference of deep learning models on embedded and energy-efficient devices becomes increasingly desirable with the rapid growth of artificial intelligence on edge. Specifically, to achieve superb energy-efficiency and scalability, efficient parallelization of single-pass deep neural network (DNN) inference on chip multiprocessor (CMP) architectures is urgently required by many time-sensitive applications. However, as the number of processing cores scales up and the performance of cores has grown much fast, the on-chip inter-core data movement is prone to be a performance bottleneck for computation. To remedy this problem and further improve the performance of network inference, in this work, we introduce a communication-aware DNN parallelization technique called CAP, by exploiting the elasticity and noise-tolerance of deep learning algorithms on CMP. Moreover, in the hope that the conducted studies can provide new design values for real-time neural network inference on embedded chips, we also have evaluated the proposed approach on both multi-core Neural Network Accelerators (NNA) chips and general-purpose chip-multiprocessors. Our experimental results show that the proposed CAP can achieve 1.12x-1.65x system speedups and 1.14x-2.70x energy efficiency for different neural networks while maintaining the inference accuracy, compared to baseline approaches.
关键词	Kernel Computer architecture Multicore processing Deep learning System-on-chip Parallel processing Real-time systems Neural networks parallel processing real-time and embedded systems single-chip multiprocessors reinforcement learning structured sparsity
DOI	10.1109/TC.2021.3099688
收录类别	SCI
语种	英语
资助项目	National Key Research and Development Program of China[2020YFB1600201] ; National Natural Science Foundation of China (NSFC)[62090024] ; National Natural Science Foundation of China (NSFC)[61874124] ; National Natural Science Foundation of China (NSFC)[61876173] ; Fundamental Research Funds for the Central Universities[2021MS017]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Engineering, Electrical & Electronic
WOS记录号	WOS:000808068000011
出版者	IEEE COMPUTER SOC
引用统计	被引频次：1[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/19591
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Wang, Ying
作者单位	1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 3.North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Peng Cheng Lab, Shenzhen 518066, Peoples R China
推荐引用方式 GB/T 7714	Zou, Kaiwei,Wang, Ying,Cheng, Long,et al. CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures[J]. IEEE TRANSACTIONS ON COMPUTERS,2022,71(7):1626-1639.
APA	Zou, Kaiwei,Wang, Ying,Cheng, Long,Qu, Songyun,Li, Huawei,&Li, Xiaowei.(2022).CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures.IEEE TRANSACTIONS ON COMPUTERS,71(7),1626-1639.
MLA	Zou, Kaiwei,et al."CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures".IEEE TRANSACTIONS ON COMPUTERS 71.7(2022):1626-1639.