Institute of Computing Technology, Chinese Academy IR
VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators | |
Ma, Li-Xian1,2; Wang, Le-Ping1; Shao, En1,2; Cao, Rong-Yu1,2; Tan, Guang-Ming1,2 | |
2025-03-01 | |
发表期刊 | JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
![]() |
ISSN | 1000-9000 |
卷号 | 40期号:2页码:444-463 |
摘要 | The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network (DNN) models on a shared accelerator, thereby enabling spatial multiplexing to enhance resource utilization. Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster. The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling. In this paper, we present VastPipe, a high-throughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters. VastPipe determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem. The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1 000 neural processing units (NPUs), VastPipe achieves average performance improvements of 2.2x, 1.3x, and 1.2x compared with the baseline systems, respectively. Furthermore, VastPipe is optimized and evaluated on mainstream GPUs. The results demonstrate that VastPipe achieves average throughput improvements of 2.7x on the NVIDIA A100 GPU and 1.9x on the AMD MI100 GPU. |
关键词 | cluster scheduling resource management reinforcement learning DNN accelerator |
DOI | 10.1007/s11390-024-3773-5 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2021YFB0300202] ; National Natural Science Foundation of China[62032023] ; National Natural Science Foundation of China[T2125013] ; National Natural Science Foundation of China[62102396] ; Beijing Nova Program[Z211100002121143] ; Youth Innovation Promotion Association of Chinese Academy of Sciences[2021099] ; Innovation Funding of Institute of Computing Technology, Chinese Academy of Sciences[E461030] ; Tianjin Science and Technology Plan Project[24ZXKJGX00060] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Software Engineering |
WOS记录号 | WOS:001483026900002 |
出版者 | SPRINGER SINGAPORE PTE LTD |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/40631 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Shao, En; Tan, Guang-Ming |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Ma, Li-Xian,Wang, Le-Ping,Shao, En,et al. VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2025,40(2):444-463. |
APA | Ma, Li-Xian,Wang, Le-Ping,Shao, En,Cao, Rong-Yu,&Tan, Guang-Ming.(2025).VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,40(2),444-463. |
MLA | Ma, Li-Xian,et al."VastPipe: A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 40.2(2025):444-463. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论