Institute of Computing Technology, Chinese Academy IR
Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product | |
Lu, Xiaobo1; Fang, Jianbin1; Peng, Lin1; Huang, Chun1; Du, Zidong2; Zhao, Yongwei3; Wang, Zheng4 | |
2024-11-01 | |
发表期刊 | ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
![]() |
ISSN | 1544-3566 |
卷号 | 21期号:4页码:25 |
摘要 | Sparse-dense matrix multiplication (SpMM) is the performance bottleneck of many high-performance and deep-learning applications, making it attractive to design specialized SpMM hardware accelerators. Unfortunately, existing hardware solutions do not take full advantage of data reuse opportunities of the input and output matrices or suffer from irregular memory access patterns. Their strategies increase the off-chip memory traffic and bandwidth pressure, leaving much room for improvement. We present MENTOR, a new approach to designing SpMM accelerators. Our key insight is that column-wise dataflow, while rarely exploited in prior works, can address these issues in SpMM computations. MENTOR is a software-hardware co-design approach for leveraging column-wise dataflow to improve data reuse and regular memory accesses of SpMM. On the software level, MENTOR incorporates a novel streaming construction scheme to preprocess the input matrix for enabling a streaming access pattern. On the hardware level, it employs a fully pipelined design to unlock the potential of column-wise dataflow further. The design of MENTOR is underpinned by a carefully designed analytical model to find the tradeoff between performance and hardware resources. We have implemented an FPGA prototype of MENTOR. Experimental results show that MENTOR achieves speedup by geomean 2.05x (up to 3.98x), reduces the memory traffic by geomean 2.92x (up to 4.93x), and improves bandwidth utilization by geomean 1.38x (up to 2.89x), compared with the state-of-the-art hardware solutions. |
关键词 | Hardware Hardware accelerators Computer systems organization Architectures |
DOI | 10.1145/3688612 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2023YFB3001503] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods |
WOS记录号 | WOS:001386358100002 |
出版者 | ASSOC COMPUTING MACHINERY |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/40805 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Lu, Xiaobo |
作者单位 | 1.Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 4.Northwest Univ, Xian, Shaanxi, Peoples R China |
推荐引用方式 GB/T 7714 | Lu, Xiaobo,Fang, Jianbin,Peng, Lin,et al. Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2024,21(4):25. |
APA | Lu, Xiaobo.,Fang, Jianbin.,Peng, Lin.,Huang, Chun.,Du, Zidong.,...&Wang, Zheng.(2024).Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,21(4),25. |
MLA | Lu, Xiaobo,et al."Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 21.4(2024):25. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论