Institute of Computing Technology, Chinese Academy IR
Lossless data compression by large models | |
Li, Ziguang1,2,3; Huang, Chao4,5; Wang, Xuliang1,4,6; Hu, Haibo4,5; Wyeth, Cole6; Bu, Dongbo1,4; Yu, Quan2; Gao, Wen2; Liu, Xingwu1,7; Li, Ming1,2,3,6 | |
2025-05-01 | |
发表期刊 | NATURE MACHINE INTELLIGENCE
![]() |
卷号 | 7期号:5页码:794-799 |
摘要 | Data compression is a fundamental technology that enables efficient storage and transmission of information. However, traditional compression methods are approaching their theoretical limits after 80 years of research and development. At the same time, large artificial intelligence models have emerged, which, trained on vast amounts of data, are able to 'understand' various semantics. Intuitively, semantics conveys the meaning of data concisely, so large models hold the potential to revolutionize compression technology. Here we present LMCompress, a new method that leverages large models to compress data. LMCompress shatters all previous lossless compression records on four media types: text, images, video and audio. It halves the compression rates of JPEG-XL for images, FLAC for audio and H.264 for video, and it achieves nearly one-third of the compression rates of zpaq for text. Our results demonstrate that the better a model understands the data, the more effectively it can compress it, suggesting a deep connection between understanding and compression. |
DOI | 10.1038/s42256-025-01033-7 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology)[2022YFA1304603] ; National Key R&D Program of China ; Proteomic Navigator of the Human Body Project[OGP0046506] ; Canada's NSERC ; Canada Research Chair Program[62072433] ; Canada Research Chair Program[62088102] ; Canada Research Chair Program[62025101] ; National Natural Science Foundation of China[2024Z119] ; Kechuang Yongjiang 2035 key technology breakthrough plan of Zhejiang Ningbo |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications |
WOS记录号 | WOS:001479636100001 |
出版者 | NATURE PORTFOLIO |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/40629 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Liu, Xingwu; Li, Ming |
作者单位 | 1.Cent China Inst Artificial Intelligence, Zhengzhou, Peoples R China 2.Peng Cheng Lab, Shenzhen, Peoples R China 3.Shanghai Inst Math & Interdisciplinary Sci, Shanghai, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 5.Ningbo Inst Artificial Intelligence Ind, Ningbo, Peoples R China 6.Univ Waterloo, Sch Comp Sci, Waterloo, ON, Canada 7.Dalian Univ Technol, Sch Math Sci, Dalian, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Ziguang,Huang, Chao,Wang, Xuliang,et al. Lossless data compression by large models[J]. NATURE MACHINE INTELLIGENCE,2025,7(5):794-799. |
APA | Li, Ziguang.,Huang, Chao.,Wang, Xuliang.,Hu, Haibo.,Wyeth, Cole.,...&Li, Ming.(2025).Lossless data compression by large models.NATURE MACHINE INTELLIGENCE,7(5),794-799. |
MLA | Li, Ziguang,et al."Lossless data compression by large models".NATURE MACHINE INTELLIGENCE 7.5(2025):794-799. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论