CSpace
Boosting Dataset Distillation With the Assistance of Crucial Samples for Visual Learning
Li, Xiaodan1,2; Zhu, Yao3; Chen, Yuefeng2; Chen, Cen1; Guo, Jianmei1; Wang, Shuhui4
2025
发表期刊IEEE TRANSACTIONS ON MULTIMEDIA
ISSN1520-9210
卷号27页码:9873-9886
摘要In recent years, massive datasets have significantly driven the advancement of visual learning such as multi-modal large model at the expense of high computational costs and extensive storage requirements. Dataset distillation (DD) aims to address this challenge by learning a small synthetic dataset such that a model trained on it can achieve a test performance comparable to that of the model trained on the original dataset. This task can be formulated as a bi-level learning problem where the outer loop optimizes the learned dataset and the inner loop updates the model parameters based on the distilled data. Different from previous studies that focus primarily on optimizing the inner loop in this bi-level problem, we delve into the task of dataset distillation from the perspective of sample cruciality. We find that discarding easy samples and keeping the hard ones that are difficult to be represented by the learned synthetic samples in the outer loop can be beneficial for DD. Motivated by this observation, we further develop an Infinite Semantic Augmentation (ISA) based dataset distillation algorithm, which discards some easier samples and implicitly enriches harder ones in the semantic space through continuous interpolation between two target feature vectors. Through detailed mathematical derivation, the joint contribution to the training loss of all interpolated feature points is formed into an analytical closed-form solution of an integral that can be optimized with almost no extra computational cost. Experimental results on several benchmark datasets demonstrate the effectiveness of our approach in reducing the dataset size while preserving the accuracy of the model. Furthermore, we show that high-quality distilled data can also benefit downstream applications, such as continual learning and membership inference defense.
关键词Training Semantics Computational modeling Visualization Synthetic data Manifolds Computational efficiency Training data Data models Continuing education Dataset distillation (DD) discarding semantic infinite interpolated
DOI10.1109/TMM.2025.3618578
收录类别SCI
语种英语
WOS研究方向Computer Science ; Telecommunications
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号WOS:001641495700021
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/42950
专题中国科学院计算技术研究所
通讯作者Chen, Cen
作者单位1.East China Normal Univ, Sch Data Sci & Engn, Shanghai 200050, Peoples R China
2.Alibaba Grp, Hangzhou 311121, Peoples R China
3.Qiyuan Lab, Beijing 100850, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Beijing 100045, Peoples R China
推荐引用方式
GB/T 7714
Li, Xiaodan,Zhu, Yao,Chen, Yuefeng,et al. Boosting Dataset Distillation With the Assistance of Crucial Samples for Visual Learning[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,27:9873-9886.
APA Li, Xiaodan,Zhu, Yao,Chen, Yuefeng,Chen, Cen,Guo, Jianmei,&Wang, Shuhui.(2025).Boosting Dataset Distillation With the Assistance of Crucial Samples for Visual Learning.IEEE TRANSACTIONS ON MULTIMEDIA,27,9873-9886.
MLA Li, Xiaodan,et al."Boosting Dataset Distillation With the Assistance of Crucial Samples for Visual Learning".IEEE TRANSACTIONS ON MULTIMEDIA 27(2025):9873-9886.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Xiaodan]的文章
[Zhu, Yao]的文章
[Chen, Yuefeng]的文章
百度学术
百度学术中相似的文章
[Li, Xiaodan]的文章
[Zhu, Yao]的文章
[Chen, Yuefeng]的文章
必应学术
必应学术中相似的文章
[Li, Xiaodan]的文章
[Zhu, Yao]的文章
[Chen, Yuefeng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。