Institute of Computing Technology, Chinese Academy IR
| DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training | |
| Meng, Lin1,2,3; Sun, Yuzhong1,3; Zhu, Jie3,4 | |
| 2026-02-01 | |
| 发表期刊 | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
![]() |
| ISSN | 0167-739X |
| 卷号 | 175页码:15 |
| 摘要 | Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced communication/computation times of tensors caused by partitioning/fusion strategies cause more bubbles. Therefore, we propose a new communication scheduling scheme DeFT, whose key insight is to relax data dependencies and support flexible scheduling in distributed training without reordering bucket communications. DeFT uncovers new overlapping chances in training by transforming the scheduling problem into multiple knapsack problems. Specifically, DeFT eliminates hard dependencies with delayed updates, reducing the coverage rate by adjusting update frequency and utilizing heterogeneous communication links, merging the computation times of backward or forward as the knapsack capacity to avoid the negative impact of unbalanced tensors. Additionally, DeFT preserves training accuracy by adjusting its scheduling strategy via convergence loss quantification. Extensive experiments with 16 A100 GPUs showed that DeFT achieved speedups of 29% to 115% on three representative benchmarks compared to US-Byte and Bytescheduler with no loss of accuracy. |
| 关键词 | Distributed deep learning Communication scheduling Data parallelism |
| DOI | 10.1016/j.future.2025.108103 |
| 收录类别 | SCI |
| 语种 | 英语 |
| 资助项目 | Science and Technology Innovation 2030-Major Project[2022ZD0119104] |
| WOS研究方向 | Computer Science |
| WOS类目 | Computer Science, Theory & Methods |
| WOS记录号 | WOS:001565585500003 |
| 出版者 | ELSEVIER |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/41723 |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Sun, Yuzhong |
| 作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 101408, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Chinese Comp Architecture, Beijing 100864, Peoples R China 4.Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China |
| 推荐引用方式 GB/T 7714 | Meng, Lin,Sun, Yuzhong,Zhu, Jie. DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2026,175:15. |
| APA | Meng, Lin,Sun, Yuzhong,&Zhu, Jie.(2026).DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,175,15. |
| MLA | Meng, Lin,et al."DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 175(2026):15. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论