Institute of Computing Technology, Chinese Academy IR
Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs | |
Li, Zhihao1,2,3; Jia, Haipeng1; Zhang, Yunquan1; Chen, Tun1; Yuan, Liang1; Vuduc, Richard4 | |
2020-08-01 | |
发表期刊 | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS |
ISSN | 1045-9219 |
卷号 | 31期号:8页码:1925-1941 |
摘要 | This article presents AutoFFT, a template-based code generation framework that can automatically generate high-performance FFT kernels for all natural-number radices. AutoFFT is based on the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix, as the outer parallelization framework. Because butterflies are the core operations of the Cooley-Tukey algorithm, we explore additional symmetric and periodic properties of the DFT matrix and formulate multiple optimized calculation templates to further reduce the number of floating-point operations for butterflies of arbitrary natural numbers. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these calculation templates and converts them into efficient assembly kernels using the template optimizer. Through a series of experiments on Arm, Intel, and AMD processors, we show that AutoFFT-generated kernels can outperform those in Fastest Fourier Transform in the West (FFTW), the Arm Performance Libraries (ARMPL), and the Intel Math Kernel Library (MKL). |
关键词 | AutoFFT FFT code generation template DFT |
DOI | 10.1109/TPDS.2020.2977629 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2107YFB0202105] ; National Key Research and Development Program of China[2016YFB0200803] ; National Key Research and Development Program of China[2017YFB0202302] ; National Natural Science Foundation of China[61602443] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61502450] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000561084300003 |
出版者 | IEEE COMPUTER SOC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15791 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Jia, Haipeng |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing 100864, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 3.Georgia Inst Technol, Atlanta, GA 30332 USA 4.Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA |
推荐引用方式 GB/T 7714 | Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,et al. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(8):1925-1941. |
APA | Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,Chen, Tun,Yuan, Liang,&Vuduc, Richard.(2020).Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(8),1925-1941. |
MLA | Li, Zhihao,et al."Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.8(2020):1925-1941. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论