Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization

doi:10.1145/3744906

	Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization
	Di, Zhanyuan 1,2; Wang, Leping 1; Ma, Zhaojia 1,2; Shao, En 2; Zhao, Jie 3; Ren, Ziyi 1; Feng, Siyuan 4; Tao, Dingwen 1; Tan, Guangming 1; Sun, Ninghui 1
	2025-09-01
发表期刊	ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
ISSN	1544-3566
卷号	22 期号:3 页码:26
摘要	Parallel structures have become a key pattern in deep neural networks (DNNs), offering improved efficiency and scalability. However, existing machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scope and insufficient analysis of intra-operator characteristics. This article introduces Magneto, a framework designed to accelerate DNN inference by co-optimizing parallel operators. Magneto broadens the fusion scope and incorporates a specialized co-tuning algorithm to optimize operators jointly. Our approach addresses the unique challenges inherent in optimizing parallel structures, enabling significant performance improvements across various hardware platforms. Experimental results show that Magneto outperforms state-of-the-art NVIDIA TensorRT and AMD MIGraphX, achieving geometric mean speedups of 2.27x and 2.88x, respectively.
关键词	Deep learning tensor compiler inference optimization code generation GPU
DOI	10.1145/3744906
收录类别	SCI
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods
WOS记录号	WOS:001606025500010
出版者	ASSOC COMPUTING MACHINERY
引用统计
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/41583
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Shao, En; Tan, Guangming
作者单位	1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Beijing, Peoples R China 3.Hunan Univ, Changsha, Peoples R China 4.Shanghai Jiao Tong Univ, Shanghai, Peoples R China
推荐引用方式 GB/T 7714	Di, Zhanyuan,Wang, Leping,Ma, Zhaojia,et al. Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2025,22(3):26.
APA	Di, Zhanyuan.,Wang, Leping.,Ma, Zhaojia.,Shao, En.,Zhao, Jie.,...&Sun, Ninghui.(2025).Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,22(3),26.
MLA	Di, Zhanyuan,et al."Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 22.3(2025):26.