Institute of Computing Technology, Chinese Academy IR
| DCHF_T: A multi-dimensional adaptive compression approach for transformer-based models | |
| Yan, Yaoyao1; Wang, Da2,4; Ye, Jing2,4,5; Yu, Hui3; Lu, Dianjie1; Zhang, Yuang1; Xu, Weizhi1,4; Liu, Fang'ai1 | |
| 2025-12-01 | |
| 发表期刊 | NEUROCOMPUTING
![]() |
| ISSN | 0925-2312 |
| 卷号 | 656页码:12 |
| 摘要 | In recent years, pre-trained language models based on the Transformer architecture have achieved significant results in many natural language processing tasks. However, the high computational cost limits their application in real-world scenarios. Previous Transformer compression methods typically focus on single-dimensional compression, which may cause over-compression and consequently degrade model performance. Additionally, these methods lack targeted optimization for specific downstream tasks. In this paper, we propose DCHF_T, a multidimensional adaptive compression approach that compresses Transformer models through token compression, attention head pruning, and a lightweight FFN. This approach selects the most informative tokens during training, prunes unimportant tokens, and retains their information in a compressed form, allowing the model to focus more on task-relevant inputs. Furthermore, DCHF_T combines attention head pruning and a lightweight FFN to reduce computation and parameter size across multiple dimensions. We employ multi-objective evolutionary search to optimize the trade-off between accuracy and efficiency under various computational budgets. Experimental results on the GLUE benchmark demonstrate that DCHF_T achieves the best compression-performance trade-off. While maintaining the highest accuracy, DCHF_T achieves a reduction of 3.7x and 3.6x in FLOPs on BERT-base and RoBERTa-base, respectively. By implementing adaptive multi-dimensional compression, DCHF_T provides a systematic solution for deploying Transformer models in resource-constrained scenarios. |
| 关键词 | Transformer Dynamic token compression Pruning Multi-dimensional adaptive compression |
| DOI | 10.1016/j.neucom.2025.131071 |
| 收录类别 | SCI |
| 语种 | 英语 |
| 资助项目 | Natural Science Foundation of Shandong Province[ZR2022MF328] ; Natural Science Foundation of Shandong Province[ZR2025MS1025] ; Natural Science Foundation of Shandong Province[ZR2024MF073] ; Natural Science Foundation of Shandong Province[ZR2019LZH014] ; National Natural Science Foundation of China[92473203] ; National Natural Science Foundation of China[61602284] ; National Natural Science Foundation of China[61602285] ; State Key Lab of Processors Open Fund Project[CLQ202409] ; State Key Lab of Processors Open Fund Project[CLQ202402] ; CCF-Ricore Education Fund[CCF-Ricore OF 2024003] |
| WOS研究方向 | Computer Science |
| WOS类目 | Computer Science, Artificial Intelligence |
| WOS记录号 | WOS:001584006500005 |
| 出版者 | ELSEVIER |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/41649 |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Xu, Weizhi |
| 作者单位 | 1.Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 3.Shandong Normal Univ, Business Sch, Jinan, Peoples R China 4.State Key Lab Processors, Beijing, Peoples R China 5.CASTEST Co Ltd, Beijing, Peoples R China |
| 推荐引用方式 GB/T 7714 | Yan, Yaoyao,Wang, Da,Ye, Jing,et al. DCHF_T: A multi-dimensional adaptive compression approach for transformer-based models[J]. NEUROCOMPUTING,2025,656:12. |
| APA | Yan, Yaoyao.,Wang, Da.,Ye, Jing.,Yu, Hui.,Lu, Dianjie.,...&Liu, Fang'ai.(2025).DCHF_T: A multi-dimensional adaptive compression approach for transformer-based models.NEUROCOMPUTING,656,12. |
| MLA | Yan, Yaoyao,et al."DCHF_T: A multi-dimensional adaptive compression approach for transformer-based models".NEUROCOMPUTING 656(2025):12. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论