Institute of Computing Technology, Chinese Academy IR
A Cross-Platform SpMV Framework on Many-Core Architectures | |
Zhang, Yunquan1; Li, Shigang1; Yan, Shengen2; Zhou, Huiyang3 | |
2016-12-01 | |
发表期刊 | ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION |
ISSN | 1544-3566 |
卷号 | 13期号:4页码:25 |
摘要 | Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth remain the critical performance bottlenecks. We present our novel solutions to these problems, for both GPUs and Intel MIC many-core architectures. First, we devise a new SpMV format, called Blocked Compressed Common Coordinate (BCCOO). BCCOO extends the blocked Common Coordinate (COO) by using bit flags to store the row indices to alleviate the bandwidth problem. We further improve this format by partitioning the matrix into vertical slices for better data locality. Then, to address the load imbalance problem, we propose a highly efficient matrix-based segmented sum/scan algorithm for SpMV, which eliminates global synchronization. At last, we introduce an autotuning framework to choose optimization parameters. Experimental results show that our proposed framework has a significant advantage over the existing SpMV libraries. In single precision, our proposed scheme outperforms clSpMV COCKTAIL format by 255% on average on AMD FirePro W8000, and outperforms CUSPARSE V7.0 by 73.7% on average and outperforms CSR5 by 53.6% on average on GeForce Titan X; in double precision, our proposed scheme outperforms CUSPARSE V7.0 by 34.0% on average and outperforms CSR5 by 16.2% on average on Tesla K20, and has equivalent performance compared with CSR5 on Intel MIC. |
关键词 | SpMV segmented scan BCCOO OpenCL CUDA GPU Intel MIC parallel algorithms |
DOI | 10.1145/2994148 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[61502450] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61272136] ; National Key Research and Development Program of China[2016YFB0200803] ; NSF project[1216569] ; AMD Inc. |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods |
WOS记录号 | WOS:000392416400002 |
出版者 | ASSOC COMPUTING MACHINERY |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/7661 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Li, Shigang; Yan, Shengen |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Chinese Univ Hong Kong, Dept Informat Engn, SenseTime Grp Ltd, Hong Kong, Hong Kong, Peoples R China 3.North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA |
推荐引用方式 GB/T 7714 | Zhang, Yunquan,Li, Shigang,Yan, Shengen,et al. A Cross-Platform SpMV Framework on Many-Core Architectures[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2016,13(4):25. |
APA | Zhang, Yunquan,Li, Shigang,Yan, Shengen,&Zhou, Huiyang.(2016).A Cross-Platform SpMV Framework on Many-Core Architectures.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,13(4),25. |
MLA | Zhang, Yunquan,et al."A Cross-Platform SpMV Framework on Many-Core Architectures".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 13.4(2016):25. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论