CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators
Ma, Li-Xian1,2; Wang, Le-Ping1; Shao, En1,2; Cao, Rong-Yu1,2; Tan, Guang-Ming1,2
2025-03-01
发表期刊JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
ISSN1000-9000
卷号40期号:2页码:444-463
摘要The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network (DNN) models on a shared accelerator, thereby enabling spatial multiplexing to enhance resource utilization. Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster. The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling. In this paper, we present VastPipe, a high-throughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters. VastPipe determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem. The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1 000 neural processing units (NPUs), VastPipe achieves average performance improvements of 2.2x, 1.3x, and 1.2x compared with the baseline systems, respectively. Furthermore, VastPipe is optimized and evaluated on mainstream GPUs. The results demonstrate that VastPipe achieves average throughput improvements of 2.7x on the NVIDIA A100 GPU and 1.9x on the AMD MI100 GPU.
关键词cluster scheduling resource management reinforcement learning DNN accelerator
DOI10.1007/s11390-024-3773-5
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2021YFB0300202] ; National Natural Science Foundation of China[62032023] ; National Natural Science Foundation of China[T2125013] ; National Natural Science Foundation of China[62102396] ; Beijing Nova Program[Z211100002121143] ; Youth Innovation Promotion Association of Chinese Academy of Sciences[2021099] ; Innovation Funding of Institute of Computing Technology, Chinese Academy of Sciences[E461030] ; Tianjin Science and Technology Plan Project[24ZXKJGX00060]
WOS研究方向Computer Science
WOS类目Computer Science, Hardware & Architecture ; Computer Science, Software Engineering
WOS记录号WOS:001483026900002
出版者SPRINGER SINGAPORE PTE LTD
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/40631
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Shao, En; Tan, Guang-Ming
作者单位1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Ma, Li-Xian,Wang, Le-Ping,Shao, En,et al. VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2025,40(2):444-463.
APA Ma, Li-Xian,Wang, Le-Ping,Shao, En,Cao, Rong-Yu,&Tan, Guang-Ming.(2025).VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,40(2),444-463.
MLA Ma, Li-Xian,et al."VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 40.2(2025):444-463.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ma, Li-Xian]的文章
[Wang, Le-Ping]的文章
[Shao, En]的文章
百度学术
百度学术中相似的文章
[Ma, Li-Xian]的文章
[Wang, Le-Ping]的文章
[Shao, En]的文章
必应学术
必应学术中相似的文章
[Ma, Li-Xian]的文章
[Wang, Le-Ping]的文章
[Shao, En]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。