Institute of Computing Technology, Chinese Academy IR
An Instruction Set Architecture for Machine Learning | |
Chen, Yunji1,2,3,4; Lan, Huiying1; Du, Zidong1; Liu, Shaoli1; Tao, Jinhua1; Han, Dong1; Luo, Tao1; Guo, Qi1; Li, Ling2,5; Xie, Yuan6; Chen, Tianshi1 | |
2019-08-01 | |
发表期刊 | ACM TRANSACTIONS ON COMPUTER SYSTEMS |
ISSN | 0734-2071 |
卷号 | 36期号:3页码:35 |
摘要 | Machine Learning (ML) are a family of models for learning from the data to improve performance on a certain task. ML techniques, especially recent renewed neural networks (deep neural networks), have proven to be efficient for a broad range of applications. ML techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which usually are not energy efficient, since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators have been proposed recently to improve energy efficiency. However, such accelerators were designed for a small set of ML techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an ML technique (such as layers in neural networks) or even an ML as a whole. Although straightforward and easy to implement for a limited set of similar ML techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different ML techniques with sufficient flexibility and efficiency. In this article, we first propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. We then extend the application scope of Cambricon from NN to ML techniques. We also propose an assembly language, an assembler, and runtime to support programming with Cambricon, especially targeting large-scale ML problems. Our evaluation over a total of 16 representative yet distinct ML techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of ML techniques and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [7] (which can only accommodate three types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks and 7 other ML benchmarks. Compared to the recent prevalent ML accelerator PuDianNao, our Cambricon-based accelerator is able to support all the ML techniques as well as the 10 NNs but with only approximate 5.1% performance loss. |
DOI | 10.1145/3331469 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2017YFA0700900] ; National Key Research and Development Program of China[2017YFA0700902] ; National Key Research and Development Program of China[2017YFA0700901] ; National Key Research and Development Program of China[2017YFB1003101] ; NSF of China[61432016] ; NSF of China[61532016] ; NSF of China[61672491] ; NSF of China[61602441] ; NSF of China[61602446] ; NSF of China[61732002] ; NSF of China[61702478] ; NSF of China[61732007] ; NSF of China[61732020] ; Beijing Natural Science Foundation[JQ18013] ; 973 Program of China[2015CB358800] ; National Science and Technology Major Project[2018ZX01031102] ; Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences[KFJ-HGZX-013] ; Key Research Projects in Frontier Science of Chinese Academy of Sciences[QYZDB-SSW-JSC001] ; Strategic Priority Research Program of Chinese Academy of Science[XDB32050200] ; Strategic Priority Research Program of Chinese Academy of Science[XDC01020000] ; CAS Center for Excellence in Brain Science and Intelligence Technology (CEBSIT) |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Theory & Methods |
WOS记录号 | WOS:000496739500003 |
出版者 | ASSOC COMPUTING MACHINERY |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/14878 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Du, Zidong |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Beijing, Peoples R China 3.BIT, ZJLab, Inst BrainIntelligence Technol, Zhanjiang Lab, Beijing, Peoples R China 4.Shanghai Res Ctr Brain Sci & Brain Inspired Intel, Shanghai, Peoples R China 5.Chinese Acad Sci, Inst Software, Beijing, Peoples R China 6.UCSB, Dept Elect & Comp Engn, Santa Barbara, CA USA |
推荐引用方式 GB/T 7714 | Chen, Yunji,Lan, Huiying,Du, Zidong,et al. An Instruction Set Architecture for Machine Learning[J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS,2019,36(3):35. |
APA | Chen, Yunji.,Lan, Huiying.,Du, Zidong.,Liu, Shaoli.,Tao, Jinhua.,...&Chen, Tianshi.(2019).An Instruction Set Architecture for Machine Learning.ACM TRANSACTIONS ON COMPUTER SYSTEMS,36(3),35. |
MLA | Chen, Yunji,et al."An Instruction Set Architecture for Machine Learning".ACM TRANSACTIONS ON COMPUTER SYSTEMS 36.3(2019):35. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论