CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs
Li, Zhihao1,2,3; Jia, Haipeng1; Zhang, Yunquan1; Chen, Tun1; Yuan, Liang1; Vuduc, Richard4
2020-08-01
发表期刊IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN1045-9219
卷号31期号:8页码:1925-1941
摘要This article presents AutoFFT, a template-based code generation framework that can automatically generate high-performance FFT kernels for all natural-number radices. AutoFFT is based on the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix, as the outer parallelization framework. Because butterflies are the core operations of the Cooley-Tukey algorithm, we explore additional symmetric and periodic properties of the DFT matrix and formulate multiple optimized calculation templates to further reduce the number of floating-point operations for butterflies of arbitrary natural numbers. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these calculation templates and converts them into efficient assembly kernels using the template optimizer. Through a series of experiments on Arm, Intel, and AMD processors, we show that AutoFFT-generated kernels can outperform those in Fastest Fourier Transform in the West (FFTW), the Arm Performance Libraries (ARMPL), and the Intel Math Kernel Library (MKL).
关键词AutoFFT FFT code generation template DFT
DOI10.1109/TPDS.2020.2977629
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2107YFB0202105] ; National Key Research and Development Program of China[2016YFB0200803] ; National Key Research and Development Program of China[2017YFB0202302] ; National Natural Science Foundation of China[61602443] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61502450]
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Theory & Methods ; Engineering, Electrical & Electronic
WOS记录号WOS:000561084300003
出版者IEEE COMPUTER SOC
引用统计
被引频次:8[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/15791
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Jia, Haipeng
作者单位1.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing 100864, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.Georgia Inst Technol, Atlanta, GA 30332 USA
4.Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
推荐引用方式
GB/T 7714
Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,et al. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(8):1925-1941.
APA Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,Chen, Tun,Yuan, Liang,&Vuduc, Richard.(2020).Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(8),1925-1941.
MLA Li, Zhihao,et al."Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.8(2020):1925-1941.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Zhihao]的文章
[Jia, Haipeng]的文章
[Zhang, Yunquan]的文章
百度学术
百度学术中相似的文章
[Li, Zhihao]的文章
[Jia, Haipeng]的文章
[Zhang, Yunquan]的文章
必应学术
必应学术中相似的文章
[Li, Zhihao]的文章
[Jia, Haipeng]的文章
[Zhang, Yunquan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。