Institute of Computing Technology, Chinese Academy IR
Mortar-FP8: Morphing the Existing FP32 Infrastructure for High-Performance Deep Learning Acceleration | |
Li, Hongyan1; Lu, Hang2,3; Li, Xiaowei1 | |
2024-03-01 | |
发表期刊 | IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS |
ISSN | 0278-0070 |
卷号 | 43期号:3页码:878-891 |
摘要 | Vanilla deep neural networks (DNNs) after training are represented with native floating-point 32 (fp32) weights. We observe that the bit-level sparsity of these weights is very abundant in the mantissa and the distribution of exponent is aggregated, which can all be directly exploited to speed up model inference. In this article, we propose Mortar and Mortar-FP8, the offline/online software and hardware collaborative approaches for fp32 DNN acceleration. The proposed methods include the software algorithms to morph the mantissa and convert fp32 weights to fp8 format, as well as associated hardware accelerator architecture to accelerate general-purpose deep learning through optimized algorithm and specialized hardware. We highlight the following results by evaluating various deep learning tasks, including image classification, object detection, video understanding, video, and image super-resolution: 1) Mortar increase mantissa sparsity up to 1.58 x -2.09x with only a negligible similar to 0.2% accuracy loss; 2) Mortar-FP8 morph the fp32 weights to fp8 format with a minimal accuracy loss of similar to 0.3%; and 3) the corresponding hardware accelerator significantly outperforms baselines, achieving up to 6.032x and 6.99x performance improvements. The area and power of Mortar are 0.031 mm2 and 68.58 mW. Those metrics are 0.0505 mm2 and 25.16 mW for Mortar-FP8. |
关键词 | Deep learning accelerator deep neural network (DNN) fp8 format |
DOI | 10.1109/TCAD.2023.3329778 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Interdisciplinary Applications ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:001170495100001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/38823 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Lu, Hang |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Zhongguancun Lab, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Shanghai Innovat Ctr Processor Technol, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Hongyan,Lu, Hang,Li, Xiaowei. Mortar-FP8: Morphing the Existing FP32 Infrastructure for High-Performance Deep Learning Acceleration[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2024,43(3):878-891. |
APA | Li, Hongyan,Lu, Hang,&Li, Xiaowei.(2024).Mortar-FP8: Morphing the Existing FP32 Infrastructure for High-Performance Deep Learning Acceleration.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,43(3),878-891. |
MLA | Li, Hongyan,et al."Mortar-FP8: Morphing the Existing FP32 Infrastructure for High-Performance Deep Learning Acceleration".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 43.3(2024):878-891. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论