CSpace  > 中国科学院计算技术研究所期刊论文
Multi-Node Acceleration for Large-Scale GCNs
Sun, Gongjian1,2; Yan, Mingyu1,2; Wang, Duo1,2; Li, Han1,2; Li, Wenming1,2; Ye, Xiaochun1,2; Fan, Dongrui1,2; Xie, Yuan3
2022-12-01
发表期刊IEEE TRANSACTIONS ON COMPUTERS
ISSN0018-9340
卷号71期号:12页码:3140-3152
摘要Limited by the memory capacity and computation power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like tensor processing unit (TPU) Pod for large-scale neural network. In this work, we aim to scale up single-node GCN accelerator to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) irregular coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) the acceleration of GCNs in MultiAccSys is mainly bounded by network bandwidth but tolerates network latency. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 & SIM; 12x speedup using only 28%$\sim$& SIM;68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. Besides, MultiGCN not only achieves 2.5 & SIM; 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graph as opposed to single-node GCN accelerators.
关键词Deep learning graph neural network hardware accelerator multi-node system communication optimization
DOI10.1109/TC.2022.3207127
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China[61732018] ; National Natural Science Foundation of China[61872335] ; National Natural Science Foundation of China[62202451] ; Austrian-Chinese Cooperative RD Project[171111KYSB20200002] ; CAS Project for Young Scientists in Basic Research[YSBR-029] ; Open Research Projects of Zhejiang Lab[2022PB0AB01] ; CAS Project for Youth Innovation Promotion Association
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Hardware & Architecture ; Engineering, Electrical & Electronic
WOS记录号WOS:000886309300007
出版者IEEE COMPUTER SOC
引用统计
被引频次:2[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/20321
专题中国科学院计算技术研究所期刊论文
通讯作者Sun, Gongjian
作者单位1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100045, Peoples R China
2.Univ Chinese Acad Sci, Beijing 101408, Peoples R China
3.Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
推荐引用方式
GB/T 7714
Sun, Gongjian,Yan, Mingyu,Wang, Duo,et al. Multi-Node Acceleration for Large-Scale GCNs[J]. IEEE TRANSACTIONS ON COMPUTERS,2022,71(12):3140-3152.
APA Sun, Gongjian.,Yan, Mingyu.,Wang, Duo.,Li, Han.,Li, Wenming.,...&Xie, Yuan.(2022).Multi-Node Acceleration for Large-Scale GCNs.IEEE TRANSACTIONS ON COMPUTERS,71(12),3140-3152.
MLA Sun, Gongjian,et al."Multi-Node Acceleration for Large-Scale GCNs".IEEE TRANSACTIONS ON COMPUTERS 71.12(2022):3140-3152.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Sun, Gongjian]的文章
[Yan, Mingyu]的文章
[Wang, Duo]的文章
百度学术
百度学术中相似的文章
[Sun, Gongjian]的文章
[Yan, Mingyu]的文章
[Wang, Duo]的文章
必应学术
必应学术中相似的文章
[Sun, Gongjian]的文章
[Yan, Mingyu]的文章
[Wang, Duo]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。