DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

doi:10.1007/s10994-023-06453-3

	DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network
	Pei, Songwen 1,2,3; Wang, Jiyao 1; Zhang, Bingxue 1; Qin, Wei 1; Xue, Hai 1; Ye, Xiaochun 2; Chen, Mingsong 3
	2024-01-31
发表期刊	MACHINE LEARNING
ISSN	0885-6125
页码	14
摘要	The ever-increasing layers and hyper-parameters of deep neural network are continuously growing to generate large-scale network by training huge masses of data. However, it is difficult to deploy deep neural network on resource-constrained edge devices. Network mixed-precision quantization is a challenging way to prune and compress deep neural network models while discovering the optimal bit width for each layer. To solve the big challenge, we therefore propose the dynamic pseudo-mean mixed-precision quantization (DPQ) by introducing two-bit scaling factors to compensate errors of quantization. Furthermore, the activation quantization named random parameters clipping (RPC) is proposed. RPC adopts partial activation quantization to reduce loss of accuracy. Therefore, DPQ can dynamically adjust the bit precision of weight quantization according to the distribution of weights. It results in a quantification scheme with strong robustness compared to previous methods. Extensive experiments demonstrate that DPQ achieves 15.43x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression rate of ResNet20 on CIFAR-10 dataset with 0.22% increase in accuracy, and 35.25x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression rate of Resnet56 on SVHN dataset with 0.12% increase in accuracy.
关键词	Big data Compression Deep learning Pseudo-mean mixed-precision quantization Pruned neural network
DOI	10.1007/s10994-023-06453-3
收录类别	SCI
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence
WOS记录号	WOS:001151730300001
出版者	SPRINGER
引用统计	被引频次：4[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/38402
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Pei, Songwen
作者单位	1.Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 3.East China Normal Univ, Software Hardware Codesign Technol & Applicat, Minist Educ, Engn Res Ctr, Shanghai 200062, Peoples R China
推荐引用方式 GB/T 7714	Pei, Songwen,Wang, Jiyao,Zhang, Bingxue,et al. DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network[J]. MACHINE LEARNING,2024:14.
APA	Pei, Songwen.,Wang, Jiyao.,Zhang, Bingxue.,Qin, Wei.,Xue, Hai.,...&Chen, Mingsong.(2024).DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network.MACHINE LEARNING,14.
MLA	Pei, Songwen,et al."DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network".MACHINE LEARNING (2024):14.