Institute of Computing Technology, Chinese Academy IR
CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs | |
Xie, Xiaolong1; Liang, Yun1; Li, Xiuhong1; Wu, Yudong1; Sun, Guangyu1; Wang, Tao1; Fan, Dongrui2 | |
2018-06-01 | |
发表期刊 | IEEE TRANSACTIONS ON COMPUTERS |
ISSN | 0018-9340 |
卷号 | 67期号:6页码:890-897 |
摘要 | The key to the high performance on GPUs lies in the massive threading to enable thread switching and hide long latencies. CPUs are equipped with a large register file to enable fast context switch. However, thread throttling techniques that are designed to mitigate cache contention, lead to under-utilization of registers. Register allocation is a significant factor for performance as it not just determines the single-thread performance, but indirectly affects the TLP. In this paper, we propose Coordinated Register Allocation and Thread-level parallelism (CRAT) to explore the optimization space of register allocation and TLP management on GPUs. CRAT employs both compile-time(CRAT-static) and run-time techniques(CRAT-dyn) to exhaust the design space. CRAT-static works statically to explore TLP and register allocation trade-off and CRAT-dyn exploits dynamic register allocation for further improvement. Experiments indicate that CRAT-static achieves an average 1.25X speedup over existing TLP management technique. On four register-limited applications, CRAT-dyn further improves the performance speedup of CRAT-static from 1.51X to 1.70X. |
关键词 | GPGPU memory hierarchy compilers |
DOI | 10.1109/TC.2017.2776272 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Natural Science Foundation of China[61672048] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Hardware & Architecture ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000431902600010 |
出版者 | IEEE COMPUTER SOC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/5355 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Xie, Xiaolong |
作者单位 | 1.Peking Univ, Sch EECS, Ctr Energy Efficient Comp & Applicat, Beijing 100080, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100864, Peoples R China |
推荐引用方式 GB/T 7714 | Xie, Xiaolong,Liang, Yun,Li, Xiuhong,et al. CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs[J]. IEEE TRANSACTIONS ON COMPUTERS,2018,67(6):890-897. |
APA | Xie, Xiaolong.,Liang, Yun.,Li, Xiuhong.,Wu, Yudong.,Sun, Guangyu.,...&Fan, Dongrui.(2018).CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs.IEEE TRANSACTIONS ON COMPUTERS,67(6),890-897. |
MLA | Xie, Xiaolong,et al."CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs".IEEE TRANSACTIONS ON COMPUTERS 67.6(2018):890-897. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论