Improving Utilization of Dataflow Unit for Multi-Batch Processing

doi:10.1145/3637906

	Improving Utilization of Dataflow Unit for Multi-Batch Processing
	Fan, Zhihua 3; Li, Wenming ; Wang, Zhen ; Yang, Yu ; Ye, Xiaochun ; Fan, Dongrui ; Sun, Ninghui ; An, Xuejun
	2024-03-01
发表期刊	ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
ISSN	1544-3566
卷号	21 期号:1 页码:26
摘要	Dataflow architectures can achieve much better performance and higher efficiency than general-purpose core, approaching the performance of a specialized design while retaining programmability. However, advanced application scenarios place higher demands on the hardware in terms of cross-domain and multi-batch processing. In this article, we propose a unified scale-vector architecture that can work in multiple modes and adapt to diverse algorithms and requirements efficiently. First, a novel reconfigurable interconnection structure is proposed, which can organize execution units into different cluster typologies as a way to accommodate different data-level parallelism. Second, we decouple threads within each DFG node into consecutive pipeline stages and provide architectural support. By time-multiplexing during these stages, dataflow hardware can achieve much higher utilization and performance. In addition, the task-based program model can also exploit multi-level parallelism and deploy applications efficiently. Evaluated in a wide range of benchmarks, including digital signal processing algorithms, CNNs, and scientific computing algorithms, our design attains up to 11.95x energy efficiency (performance-per-watt) improvement over GPU (V100), and 2.01x energy efficiency improvement over state-of-the-art dataflow architectures.
关键词	Utilization network-on-chip decoupled architecture batch processing
DOI	10.1145/3637906
收录类别	SCI
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods
WOS记录号	WOS:001193465400017
出版者	ASSOC COMPUTING MACHINERY
引用统计	被引频次：2[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/38773
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Fan, Zhihua
作者单位	1.Chinese Acad Sci, State Key Lab Processors Inst Comp Technol, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Beijing, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, 6 South Sci Acad Rd, Beijing, Peoples R China 4.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing, Peoples R China
推荐引用方式 GB/T 7714	Fan, Zhihua,Li, Wenming,Wang, Zhen,et al. Improving Utilization of Dataflow Unit for Multi-Batch Processing[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2024,21(1):26.
APA	Fan, Zhihua.,Li, Wenming.,Wang, Zhen.,Yang, Yu.,Ye, Xiaochun.,...&An, Xuejun.(2024).Improving Utilization of Dataflow Unit for Multi-Batch Processing.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,21(1),26.
MLA	Fan, Zhihua,et al."Improving Utilization of Dataflow Unit for Multi-Batch Processing".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 21.1(2024):26.