CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training
Meng, Lin1,2,3; Sun, Yuzhong1,3; Zhu, Jie3,4
2026-02-01
发表期刊FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
ISSN0167-739X
卷号175页码:15
摘要Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced communication/computation times of tensors caused by partitioning/fusion strategies cause more bubbles. Therefore, we propose a new communication scheduling scheme DeFT, whose key insight is to relax data dependencies and support flexible scheduling in distributed training without reordering bucket communications. DeFT uncovers new overlapping chances in training by transforming the scheduling problem into multiple knapsack problems. Specifically, DeFT eliminates hard dependencies with delayed updates, reducing the coverage rate by adjusting update frequency and utilizing heterogeneous communication links, merging the computation times of backward or forward as the knapsack capacity to avoid the negative impact of unbalanced tensors. Additionally, DeFT preserves training accuracy by adjusting its scheduling strategy via convergence loss quantification. Extensive experiments with 16 A100 GPUs showed that DeFT achieved speedups of 29% to 115% on three representative benchmarks compared to US-Byte and Bytescheduler with no loss of accuracy.
关键词Distributed deep learning Communication scheduling Data parallelism
DOI10.1016/j.future.2025.108103
收录类别SCI
语种英语
资助项目Science and Technology Innovation 2030-Major Project[2022ZD0119104]
WOS研究方向Computer Science
WOS类目Computer Science, Theory & Methods
WOS记录号WOS:001565585500003
出版者ELSEVIER
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/41723
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Sun, Yuzhong
作者单位1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 101408, Peoples R China
3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Chinese Comp Architecture, Beijing 100864, Peoples R China
4.Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
推荐引用方式
GB/T 7714
Meng, Lin,Sun, Yuzhong,Zhu, Jie. DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2026,175:15.
APA Meng, Lin,Sun, Yuzhong,&Zhu, Jie.(2026).DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,175,15.
MLA Meng, Lin,et al."DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 175(2026):15.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Meng, Lin]的文章
[Sun, Yuzhong]的文章
[Zhu, Jie]的文章
百度学术
百度学术中相似的文章
[Meng, Lin]的文章
[Sun, Yuzhong]的文章
[Zhu, Jie]的文章
必应学术
必应学术中相似的文章
[Meng, Lin]的文章
[Sun, Yuzhong]的文章
[Zhu, Jie]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。