Institute of Computing Technology, Chinese Academy IR
CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures | |
Zou, Kaiwei1,2; Wang, Ying1,2; Cheng, Long3; Qu, Songyun2,4; Li, Huawei1,2,5; Li, Xiaowei1,2 | |
2022-07-01 | |
发表期刊 | IEEE TRANSACTIONS ON COMPUTERS |
ISSN | 0018-9340 |
卷号 | 71期号:7页码:1626-1639 |
摘要 | Real-time inference of deep learning models on embedded and energy-efficient devices becomes increasingly desirable with the rapid growth of artificial intelligence on edge. Specifically, to achieve superb energy-efficiency and scalability, efficient parallelization of single-pass deep neural network (DNN) inference on chip multiprocessor (CMP) architectures is urgently required by many time-sensitive applications. However, as the number of processing cores scales up and the performance of cores has grown much fast, the on-chip inter-core data movement is prone to be a performance bottleneck for computation. To remedy this problem and further improve the performance of network inference, in this work, we introduce a communication-aware DNN parallelization technique called CAP, by exploiting the elasticity and noise-tolerance of deep learning algorithms on CMP. Moreover, in the hope that the conducted studies can provide new design values for real-time neural network inference on embedded chips, we also have evaluated the proposed approach on both multi-core Neural Network Accelerators (NNA) chips and general-purpose chip-multiprocessors. Our experimental results show that the proposed CAP can achieve 1.12x-1.65x system speedups and 1.14x-2.70x energy efficiency for different neural networks while maintaining the inference accuracy, compared to baseline approaches. |
关键词 | Kernel Computer architecture Multicore processing Deep learning System-on-chip Parallel processing Real-time systems Neural networks parallel processing real-time and embedded systems single-chip multiprocessors reinforcement learning structured sparsity |
DOI | 10.1109/TC.2021.3099688 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2020YFB1600201] ; National Natural Science Foundation of China (NSFC)[62090024] ; National Natural Science Foundation of China (NSFC)[61874124] ; National Natural Science Foundation of China (NSFC)[61876173] ; Fundamental Research Funds for the Central Universities[2021MS017] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Hardware & Architecture ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000808068000011 |
出版者 | IEEE COMPUTER SOC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/19591 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Wang, Ying |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 3.North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Peng Cheng Lab, Shenzhen 518066, Peoples R China |
推荐引用方式 GB/T 7714 | Zou, Kaiwei,Wang, Ying,Cheng, Long,et al. CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures[J]. IEEE TRANSACTIONS ON COMPUTERS,2022,71(7):1626-1639. |
APA | Zou, Kaiwei,Wang, Ying,Cheng, Long,Qu, Songyun,Li, Huawei,&Li, Xiaowei.(2022).CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures.IEEE TRANSACTIONS ON COMPUTERS,71(7),1626-1639. |
MLA | Zou, Kaiwei,et al."CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures".IEEE TRANSACTIONS ON COMPUTERS 71.7(2022):1626-1639. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论