An Instruction Set Architecture for Machine Learning

doi:10.1145/3331469

	An Instruction Set Architecture for Machine Learning
	Chen, Yunji 1,2,3,4; Lan, Huiying 1; Du, Zidong 1; Liu, Shaoli 1; Tao, Jinhua 1; Han, Dong 1; Luo, Tao 1; Guo, Qi 1; Li, Ling 2,5; Xie, Yuan 6; Chen, Tianshi 1
	2019-08-01
发表期刊	ACM TRANSACTIONS ON COMPUTER SYSTEMS
ISSN	0734-2071
卷号	36 期号:3 页码:35
摘要	Machine Learning (ML) are a family of models for learning from the data to improve performance on a certain task. ML techniques, especially recent renewed neural networks (deep neural networks), have proven to be efficient for a broad range of applications. ML techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which usually are not energy efficient, since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators have been proposed recently to improve energy efficiency. However, such accelerators were designed for a small set of ML techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an ML technique (such as layers in neural networks) or even an ML as a whole. Although straightforward and easy to implement for a limited set of similar ML techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different ML techniques with sufficient flexibility and efficiency. In this article, we first propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. We then extend the application scope of Cambricon from NN to ML techniques. We also propose an assembly language, an assembler, and runtime to support programming with Cambricon, especially targeting large-scale ML problems. Our evaluation over a total of 16 representative yet distinct ML techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of ML techniques and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [7] (which can only accommodate three types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks and 7 other ML benchmarks. Compared to the recent prevalent ML accelerator PuDianNao, our Cambricon-based accelerator is able to support all the ML techniques as well as the 10 NNs but with only approximate 5.1% performance loss.
DOI	10.1145/3331469
收录类别	SCI
语种	英语
资助项目	National Key Research and Development Program of China[2017YFA0700900] ; National Key Research and Development Program of China[2017YFA0700902] ; National Key Research and Development Program of China[2017YFA0700901] ; National Key Research and Development Program of China[2017YFB1003101] ; NSF of China[61432016] ; NSF of China[61532016] ; NSF of China[61672491] ; NSF of China[61602441] ; NSF of China[61602446] ; NSF of China[61732002] ; NSF of China[61702478] ; NSF of China[61732007] ; NSF of China[61732020] ; Beijing Natural Science Foundation[JQ18013] ; 973 Program of China[2015CB358800] ; National Science and Technology Major Project[2018ZX01031102] ; Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences[KFJ-HGZX-013] ; Key Research Projects in Frontier Science of Chinese Academy of Sciences[QYZDB-SSW-JSC001] ; Strategic Priority Research Program of Chinese Academy of Science[XDB32050200] ; Strategic Priority Research Program of Chinese Academy of Science[XDC01020000] ; CAS Center for Excellence in Brain Science and Intelligence Technology (CEBSIT)
WOS研究方向	Computer Science
WOS类目	Computer Science, Theory & Methods
WOS记录号	WOS:000496739500003
出版者	ASSOC COMPUTING MACHINERY
引用统计	被引频次：8[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/14878
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Du, Zidong
作者单位	1.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Beijing, Peoples R China 3.BIT, ZJLab, Inst BrainIntelligence Technol, Zhanjiang Lab, Beijing, Peoples R China 4.Shanghai Res Ctr Brain Sci & Brain Inspired Intel, Shanghai, Peoples R China 5.Chinese Acad Sci, Inst Software, Beijing, Peoples R China 6.UCSB, Dept Elect & Comp Engn, Santa Barbara, CA USA
推荐引用方式 GB/T 7714	Chen, Yunji,Lan, Huiying,Du, Zidong,et al. An Instruction Set Architecture for Machine Learning[J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS,2019,36(3):35.
APA	Chen, Yunji.,Lan, Huiying.,Du, Zidong.,Liu, Shaoli.,Tao, Jinhua.,...&Chen, Tianshi.(2019).An Instruction Set Architecture for Machine Learning.ACM TRANSACTIONS ON COMPUTER SYSTEMS,36(3),35.
MLA	Chen, Yunji,et al."An Instruction Set Architecture for Machine Learning".ACM TRANSACTIONS ON COMPUTER SYSTEMS 36.3(2019):35.