CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design
Zhang, Zhiyuan; Fan, Zhihua1; Li, Wenming; Qiu, Yuhang; Wang, Zhen; Ye, Xiaochun; Fan, Dongrui; An, Xuejun
2025-02-01
发表期刊JOURNAL OF SYSTEMS ARCHITECTURE
ISSN1383-7621
卷号159页码:16
摘要Tensor multiplication holds a pivotal position in numerous applications. The existing accelerators predominantly rely on inner or outer products for their computational strategies, yet these methodologies encounter obstacles such as excessive storage overhead, underutilization of parallelism, and merging costs. To tackle the challenges, we propose an acceleration technique that integrates a hybrid product approach with a tailored hardware. Our design can accommodate tensor multiplications of various scales, boasting exceptional scalability. First, we employ a hybrid product approach for tensor multiplications, strategically leveraging various methods - including inner, outer, and Hadamard products - to optimize different stages of submatrices computations. Second, we devise a dedicated architecture that seamlessly aligns with hybrid product, leveraging dataflow paradigm to map tensor multiplication efficiently onto the hardware. Third, we design a sliding-window partial reuse FIFO (SWFIFO), alongside a data reorder and scheduling unit to accelerate data retrieval. For general matrix multiplication (GEMM), our design demonstrates an average speedup of 17.62x and 9.47% energy consumption over Nvidia's V100 GPU. Furthermore, it surpasses Google's TPU (size of 256 x 256) by an average of 3.76x, TPUv2 (size of 128 x 128) by 3.19x and Eyeriss by 3.8x. When evaluated on eight neural network models, our design yields a performance boost of 2.89x over TPU and 2.19x over Eyeriss.
关键词Tensor multiplication Hybrid product Dataflow Accelerator
DOI10.1016/j.sysarc.2025.103333
收录类别SCI
语种英语
资助项目National Key R and D Program of China ; Beijing Nova Program, China[20230484420] ; CAS Project for Young Scientists in Basic Research, China ; CAS Project for Youth Innovation Promotion Association, China ; Beijing Natural Science Foundation, China[L234078] ; [2023YFB4503500] ; [20220484054] ; [YSBR-029]
WOS研究方向Computer Science
WOS类目Computer Science, Hardware & Architecture ; Computer Science, Software Engineering
WOS记录号WOS:001402860200001
出版者ELSEVIER
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/40763
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Fan, Zhihua
作者单位1.Chinese Acad Sci, Inst Comp Technol, SKLP, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci UCAS, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Zhiyuan,Fan, Zhihua,Li, Wenming,et al. Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design[J]. JOURNAL OF SYSTEMS ARCHITECTURE,2025,159:16.
APA Zhang, Zhiyuan.,Fan, Zhihua.,Li, Wenming.,Qiu, Yuhang.,Wang, Zhen.,...&An, Xuejun.(2025).Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design.JOURNAL OF SYSTEMS ARCHITECTURE,159,16.
MLA Zhang, Zhiyuan,et al."Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design".JOURNAL OF SYSTEMS ARCHITECTURE 159(2025):16.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Zhiyuan]的文章
[Fan, Zhihua]的文章
[Li, Wenming]的文章
百度学术
百度学术中相似的文章
[Zhang, Zhiyuan]的文章
[Fan, Zhihua]的文章
[Li, Wenming]的文章
必应学术
必应学术中相似的文章
[Zhang, Zhiyuan]的文章
[Fan, Zhihua]的文章
[Li, Wenming]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。