CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization
Di, Zhanyuan1,2; Wang, Leping1; Ma, Zhaojia1,2; Shao, En2; Zhao, Jie3; Ren, Ziyi1; Feng, Siyuan4; Tao, Dingwen1; Tan, Guangming1; Sun, Ninghui1
2025-09-01
发表期刊ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
ISSN1544-3566
卷号22期号:3页码:26
摘要Parallel structures have become a key pattern in deep neural networks (DNNs), offering improved efficiency and scalability. However, existing machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scope and insufficient analysis of intra-operator characteristics. This article introduces Magneto, a framework designed to accelerate DNN inference by co-optimizing parallel operators. Magneto broadens the fusion scope and incorporates a specialized co-tuning algorithm to optimize operators jointly. Our approach addresses the unique challenges inherent in optimizing parallel structures, enabling significant performance improvements across various hardware platforms. Experimental results show that Magneto outperforms state-of-the-art NVIDIA TensorRT and AMD MIGraphX, achieving geometric mean speedups of 2.27x and 2.88x, respectively.
关键词Deep learning tensor compiler inference optimization code generation GPU
DOI10.1145/3744906
收录类别SCI
语种英语
资助项目NKRDP[2021YFB0300202] ; NSFC[62032023] ; NSFC[T2125013] ; NSFC[T2422007] ; NSFC[62225205] ; NSFC[U24A20235] ; Youth Innovation Promotion Association of CAS[2021099] ; Innovation Funding of ICT, CAS[E461030] ; Tianjin Science and Technology Plan Project[24ZXKJGX00060]
WOS研究方向Computer Science
WOS类目Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods
WOS记录号WOS:001606025500010
出版者ASSOC COMPUTING MACHINERY
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/41583
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Shao, En; Tan, Guangming
作者单位1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China
2.Univ Chinese Acad Sci, Beijing, Peoples R China
3.Hunan Univ, Changsha, Peoples R China
4.Shanghai Jiao Tong Univ, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Di, Zhanyuan,Wang, Leping,Ma, Zhaojia,et al. Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2025,22(3):26.
APA Di, Zhanyuan.,Wang, Leping.,Ma, Zhaojia.,Shao, En.,Zhao, Jie.,...&Sun, Ninghui.(2025).Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,22(3),26.
MLA Di, Zhanyuan,et al."Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 22.3(2025):26.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Di, Zhanyuan]的文章
[Wang, Leping]的文章
[Ma, Zhaojia]的文章
百度学术
百度学术中相似的文章
[Di, Zhanyuan]的文章
[Wang, Leping]的文章
[Ma, Zhaojia]的文章
必应学术
必应学术中相似的文章
[Di, Zhanyuan]的文章
[Wang, Leping]的文章
[Ma, Zhaojia]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。