Institute of Computing Technology, Chinese Academy IR
| OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport | |
| Cao, Zongsheng1,2,3; Xu, Qianqian4; Yang, Zhiyong5; He, Yuan6; Cao, Xiaochun7; Huang, Qingming4,8,9,10 | |
| 2025 | |
| 发表期刊 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
| ISSN | 1520-9210 |
| 卷号 | 27页码:8603-8617 |
| 摘要 | In recent years, there has been a growing interest in multimodal recommendation systems due to the rapid growth of multimedia and the explosion of information. Despite notable advancements, current models often fuse multimodal embeddings with ID (name or concept) embeddings in a weighted or concatenated manner for items. Under this circumstance, they may overlook the heterogeneity problem between different modalities, and lack theoretical guarantees, potentially leading to suboptimal item representations. To overcome this challenge, we introduce a novel model named OTRec, which employs optimal transport (OT) to align heterogeneous multimodal embeddings with ID embeddings. Specifically, OTRec captures co-occurrence features across modalities and distinctive features within modalities, enabling the formation of the unified representation from both modal-invariant and modal-specific perspectives. This dual strategy ensures a comprehensive alignment of heterogeneous multimodal data, significantly improving the accuracy of capturing user preferences. Additionally, traditional recommendation models typically match an item's ID with its multimodal data as positive samples for contrastive learning, neglecting the potential complementary information from other items' multimodal data. To address this issue, we introduce a semantic-enhanced contrastive learning module, which can learn latent semantic correlations across items by a semantic-similarity weighting matrix. It can be integrated as a plug-in for other models to effectively explore latent semantics. On top of this, we provide theoretical guarantees that demonstrate the effectiveness of OTRec in aligning multimodal and ID information and in enhancing the mutual information between them. Extensive evaluations on three public datasets illustrate OTRec's effectiveness and achieve state-of-the-art performance. |
| 关键词 | Semantics Recommender systems Contrastive learning Electronic mail Lattices Data models Data mining Artificial intelligence Accuracy Visualization Multimodal recommendation optimal transport modal-invariant modal-specific |
| DOI | 10.1109/TMM.2025.3607735 |
| 收录类别 | SCI |
| 语种 | 英语 |
| WOS研究方向 | Computer Science ; Telecommunications |
| WOS类目 | Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications |
| WOS记录号 | WOS:001615548800007 |
| 出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/43060 |
| 专题 | 中国科学院计算技术研究所 |
| 通讯作者 | Xu, Qianqian; Huang, Qingming |
| 作者单位 | 1.Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur SKLOIS, Beijing 100093, Peoples R China 2.Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100080, Peoples R China 3.Shanghai AI Lab, Shanghai 200232, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China 5.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China 6.Alibaba Grp, Secur Dept, Hangzhou 311121, Peoples R China 7.Sun Yat sen Univ, Sch Cyber Sci & Technol, Shenzhen Campus, Shenzhen 518107, Peoples R China 8.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China 9.Univ Chinese Acad Sci, Key Lab Big Data Min & Knowledge Management BDKM, Beijing 101408, Peoples R China 10.Peng Cheng Lab, Shenzhen 518055, Peoples R China |
| 推荐引用方式 GB/T 7714 | Cao, Zongsheng,Xu, Qianqian,Yang, Zhiyong,et al. OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,27:8603-8617. |
| APA | Cao, Zongsheng,Xu, Qianqian,Yang, Zhiyong,He, Yuan,Cao, Xiaochun,&Huang, Qingming.(2025).OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport.IEEE TRANSACTIONS ON MULTIMEDIA,27,8603-8617. |
| MLA | Cao, Zongsheng,et al."OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport".IEEE TRANSACTIONS ON MULTIMEDIA 27(2025):8603-8617. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论