Institute of Computing Technology, Chinese Academy IR
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning | |
Jiang, Youhe1; Gu, Huaxi1; Lu, Yunfeng1; Yu, Xiaoshan1,2 | |
2020 | |
发表期刊 | IEEE ACCESS |
ISSN | 2169-3536 |
卷号 | 8页码:183488-183494 |
摘要 | Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale. |
关键词 | Distributed machine learning large-scale cluster topology communication overhead all-reduce |
DOI | 10.1109/ACCESS.2020.3028367 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2018YFE0202800] ; National Natural Science Foundation of China[61634004] ; National Natural Science Foundation of China[61934002] ; Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars[2020JC-26] ; Fundamental Research Funds for the Central Universities[JB190105] ; State Key Laboratory of Computer Architecture (ICT, CAS)[CARCH201919] ; China Postdoctoral Science Foundation[2018M633465] |
WOS研究方向 | Computer Science ; Engineering ; Telecommunications |
WOS类目 | Computer Science, Information Systems ; Engineering, Electrical & Electronic ; Telecommunications |
WOS记录号 | WOS:000585641700001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15988 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Gu, Huaxi |
作者单位 | 1.Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Jiang, Youhe,Gu, Huaxi,Lu, Yunfeng,et al. 2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning[J]. IEEE ACCESS,2020,8:183488-183494. |
APA | Jiang, Youhe,Gu, Huaxi,Lu, Yunfeng,&Yu, Xiaoshan.(2020).2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning.IEEE ACCESS,8,183488-183494. |
MLA | Jiang, Youhe,et al."2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning".IEEE ACCESS 8(2020):183488-183494. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论