CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
PDD: Pruning Neural Networks During Knowledge Distillation
Dan, Xi2; Yang, Wenjie1,3; Zhang, Fuyan1,3; Zhou, Yihang4; Yu, Zhuojun5; Qiu, Zhen6; Zhao, Boyuan6; Dong, Zeyu1; Huang, Libo1; Yang, Chuanguang1
2024-08-31
发表期刊COGNITIVE COMPUTATION
ISSN1866-9956
页码11
摘要Although deep neural networks have developed at a high level, the large computational requirement limits the deployment in end devices. To this end, a variety of model compression and acceleration techniques have been developed. Among these, knowledge distillation has emerged as a popular approach that involves training a small student model to mimic the performance of a larger teacher model. However, the student architectures used in existing knowledge distillation are not optimal and always have redundancy, which raises questions about the validity of this assumption in practice. This study aims to investigate this assumption and empirically demonstrate that student models could contain redundancy, which can be removed through pruning without significant performance degradation. Therefore, we propose a novel pruning method to eliminate redundancy in student models. Instead of using traditional post-training pruning methods, we perform pruning during knowledge distillation (PDD) to prevent any loss of important information from the teacher models to the student models. This is achieved by designing a differentiable mask for each convolutional layer, which can dynamically adjust the channels to be pruned based on the loss. Experimental results show that with ResNet20 as the student model and ResNet56 as the teacher model, a 39.53%-FLOPs reduction was achieved by removing 32.77% of parameters, while the top-1 accuracy on CIFAR10 increased by 0.17%. With VGG11 as the student model and VGG16 as the teacher model, a 74.96%-FLOPs reduction was achieved by removing 76.43% of parameters, with only a loss of 1.34% in the top-1 accuracy on CIFAR10. Our code is available at https://github.com/YihangZhou0424/PDD-Pruning-during-distillation.
关键词Knowledge distillation Model pruning Model compression
DOI10.1007/s12559-024-10350-9
收录类别SCI
语种英语
资助项目Beijing Natural Science Foundation ; [4244098]
WOS研究方向Computer Science ; Neurosciences & Neurology
WOS类目Computer Science, Artificial Intelligence ; Neurosciences
WOS记录号WOS:001302314600001
出版者SPRINGER
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/39632
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Yang, Chuanguang
作者单位1.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
2.Cent Univ Finance & Econ, Beijing, Peoples R China
3.Univ Glasgow, Glasgow City, Scotland
4.Univ Queensland, Brisbane, Australia
5.Univ Bristol, Bristol, England
6.China Univ Min & Technol, Xuzhou, Peoples R China
推荐引用方式
GB/T 7714
Dan, Xi,Yang, Wenjie,Zhang, Fuyan,et al. PDD: Pruning Neural Networks During Knowledge Distillation[J]. COGNITIVE COMPUTATION,2024:11.
APA Dan, Xi.,Yang, Wenjie.,Zhang, Fuyan.,Zhou, Yihang.,Yu, Zhuojun.,...&Yang, Chuanguang.(2024).PDD: Pruning Neural Networks During Knowledge Distillation.COGNITIVE COMPUTATION,11.
MLA Dan, Xi,et al."PDD: Pruning Neural Networks During Knowledge Distillation".COGNITIVE COMPUTATION (2024):11.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Dan, Xi]的文章
[Yang, Wenjie]的文章
[Zhang, Fuyan]的文章
百度学术
百度学术中相似的文章
[Dan, Xi]的文章
[Yang, Wenjie]的文章
[Zhang, Fuyan]的文章
必应学术
必应学术中相似的文章
[Dan, Xi]的文章
[Yang, Wenjie]的文章
[Zhang, Fuyan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。