Institute of Computing Technology, Chinese Academy IR
Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design | |
Zhang, Zhiyuan; Fan, Zhihua1; Li, Wenming; Qiu, Yuhang; Wang, Zhen; Ye, Xiaochun; Fan, Dongrui; An, Xuejun | |
2025-02-01 | |
发表期刊 | JOURNAL OF SYSTEMS ARCHITECTURE
![]() |
ISSN | 1383-7621 |
卷号 | 159页码:16 |
摘要 | Tensor multiplication holds a pivotal position in numerous applications. The existing accelerators predominantly rely on inner or outer products for their computational strategies, yet these methodologies encounter obstacles such as excessive storage overhead, underutilization of parallelism, and merging costs. To tackle the challenges, we propose an acceleration technique that integrates a hybrid product approach with a tailored hardware. Our design can accommodate tensor multiplications of various scales, boasting exceptional scalability. First, we employ a hybrid product approach for tensor multiplications, strategically leveraging various methods - including inner, outer, and Hadamard products - to optimize different stages of submatrices computations. Second, we devise a dedicated architecture that seamlessly aligns with hybrid product, leveraging dataflow paradigm to map tensor multiplication efficiently onto the hardware. Third, we design a sliding-window partial reuse FIFO (SWFIFO), alongside a data reorder and scheduling unit to accelerate data retrieval. For general matrix multiplication (GEMM), our design demonstrates an average speedup of 17.62x and 9.47% energy consumption over Nvidia's V100 GPU. Furthermore, it surpasses Google's TPU (size of 256 x 256) by an average of 3.76x, TPUv2 (size of 128 x 128) by 3.19x and Eyeriss by 3.8x. When evaluated on eight neural network models, our design yields a performance boost of 2.89x over TPU and 2.19x over Eyeriss. |
关键词 | Tensor multiplication Hybrid product Dataflow Accelerator |
DOI | 10.1016/j.sysarc.2025.103333 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key R and D Program of China ; Beijing Nova Program, China[20230484420] ; CAS Project for Young Scientists in Basic Research, China ; CAS Project for Youth Innovation Promotion Association, China ; Beijing Natural Science Foundation, China[L234078] ; [2023YFB4503500] ; [20220484054] ; [YSBR-029] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Software Engineering |
WOS记录号 | WOS:001402860200001 |
出版者 | ELSEVIER |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/40763 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Fan, Zhihua |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, SKLP, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci UCAS, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Zhiyuan,Fan, Zhihua,Li, Wenming,et al. Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design[J]. JOURNAL OF SYSTEMS ARCHITECTURE,2025,159:16. |
APA | Zhang, Zhiyuan.,Fan, Zhihua.,Li, Wenming.,Qiu, Yuhang.,Wang, Zhen.,...&An, Xuejun.(2025).Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design.JOURNAL OF SYSTEMS ARCHITECTURE,159,16. |
MLA | Zhang, Zhiyuan,et al."Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design".JOURNAL OF SYSTEMS ARCHITECTURE 159(2025):16. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论