ParaML: A Polyvalent Multicore Accelerator for Machine Learning

doi:10.1109/TCAD.2019.2927523

	ParaML: A Polyvalent Multicore Accelerator for Machine Learning
	Zhou, Shengyuan 1,2; Guo, Qi 1,3; Du, Zidong 1,3; Liu, Daofu 1,3; Chen, Tianshi 1,3,4; Li, Ling 5; Liu, Shaoli 1,3; Zhou, Jinhong 1,3; Temam, Olivier 6; Feng, Xiaobing 7; Zhou, Xuehai 8; Chen, Yunji 1,2,4
	2020-09-01
发表期刊	IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
ISSN	0278-0070
卷号	39 期号:9 页码:1764-1777
摘要	In recent years, machine learning (ML) techniques are proven to be powerful tools in various emerging applications. Traditionally, ML techniques are processed on general-purpose CPUs and GPUs, but their energy efficiencies are limited due to their excessive support for flexibility. As an efficient alternative to CPUs/GPUs, hardware accelerators are still limited as they often accommodate only a single ML technique (family). However, different problems may require different ML techniques, which implies that such accelerators may achieve poor learning accuracy or even be ineffective. In this paper, we present a polyvalent accelerator architecture integrated with multiple processing cores, called ParaML, which accommodates ten representative ML techniques, including k-means, k-nearest neighbors (k-NN), naive Bayes (NB), support vector machine (SVM), linear regression (LR), classification tree (CT), deep neural network (DNN), learning vector quantization (LVQ), parzen window (PW), and principal component analysis (PCA). Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, the single-core ParaML can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm(2) and consumes 596 mW only, estimated by ICC and PrimeTime PX with post-synthesis netlist, respectively. Compared with the NVIDIA K20M GPU (28-nm process), the single-core ParaML (65-nm process) is 1.21x faster, and can reduce the energy by 137.93x. We also compare the single-core ParaML with other accelerators. Compared with PRINS, single-core ParaML achieves 72.09x and 2.57x energy benefit for k-NN and k-means, respectively, and speeds up each query in k-NN by 44.76x. Compared with EIE, the single-core ParaML achieves 5.02x speedup and 4.97x energy benefit with 11.62x less area when evaluating with dense DNN. Compared with TPU, the single-core ParaML achieves 2.45x better power efficiency (5647 Gop/W versus 2300 Gop/W) with 321.36x less area. Compared to the single-core version, the 8-core ParaML will further improve the speedup up to 3.98x with an area of 13.44 mm(2) and a power of 2036 mW.
关键词	Neural networks Machine learning Testing Support vector machines Linear regression Computers Computer architecture Accelerator machine learning (ML) techniques multicore accelerator
DOI	10.1109/TCAD.2019.2927523
收录类别	SCI
语种	英语
资助项目	National Key Research and Development Program of China[2017YFA0700902] ; NSF of China[61432016] ; NSF of China[61532016] ; NSF of China[61672491] ; NSF of China[61602441] ; NSF of China[61602446] ; NSF of China[61732002] ; NSF of China[61702478] ; NSF of China[61732007] ; NSF of China[61732020] ; Beijing Natural Science Foundation[JQ18013] ; 973 Program of China[2015CB358800] ; National Science and Technology Major Project[2018ZX01031102] ; Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences[KFJ-HGZX-013] ; Key Research Projects in Frontier Science of Chinese Academy of Sciences[QYZDB-SSW-JSC001] ; Strategic Priority Research Program of Chinese Academy of Science[XDB32050200] ; Strategic Priority Research Program of Chinese Academy of Science[XDC01020000]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic
WOS记录号	WOS:000562034400002
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计	被引频次：3[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/15794
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Guo, Qi
作者单位	1.Chinese Acad Sci, Inst Comp Technol, Intelligent Processor Res Ctr, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China 3.Cambricon Technol Corp Ltd, Beijing 100191, Peoples R China 4.CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China 5.Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China 6.Inria Scalay, F-91120 Palaiseau, France 7.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 8.Univ Sci & Technol China, Hefei 230026, Peoples R China
推荐引用方式 GB/T 7714	Zhou, Shengyuan,Guo, Qi,Du, Zidong,et al. ParaML: A Polyvalent Multicore Accelerator for Machine Learning[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2020,39(9):1764-1777.
APA	Zhou, Shengyuan.,Guo, Qi.,Du, Zidong.,Liu, Daofu.,Chen, Tianshi.,...&Chen, Yunji.(2020).ParaML: A Polyvalent Multicore Accelerator for Machine Learning.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,39(9),1764-1777.
MLA	Zhou, Shengyuan,et al."ParaML: A Polyvalent Multicore Accelerator for Machine Learning".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 39.9(2020):1764-1777.