OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport

doi:10.1109/TMM.2025.3607735

CSpace

	OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport
	Cao, Zongsheng 1,2,3; Xu, Qianqian 4; Yang, Zhiyong 5; He, Yuan 6; Cao, Xiaochun 7; Huang, Qingming 4,8,9,10
	2025
发表期刊	IEEE TRANSACTIONS ON MULTIMEDIA
ISSN	1520-9210
卷号	27 页码:8603-8617
摘要	In recent years, there has been a growing interest in multimodal recommendation systems due to the rapid growth of multimedia and the explosion of information. Despite notable advancements, current models often fuse multimodal embeddings with ID (name or concept) embeddings in a weighted or concatenated manner for items. Under this circumstance, they may overlook the heterogeneity problem between different modalities, and lack theoretical guarantees, potentially leading to suboptimal item representations. To overcome this challenge, we introduce a novel model named OTRec, which employs optimal transport (OT) to align heterogeneous multimodal embeddings with ID embeddings. Specifically, OTRec captures co-occurrence features across modalities and distinctive features within modalities, enabling the formation of the unified representation from both modal-invariant and modal-specific perspectives. This dual strategy ensures a comprehensive alignment of heterogeneous multimodal data, significantly improving the accuracy of capturing user preferences. Additionally, traditional recommendation models typically match an item's ID with its multimodal data as positive samples for contrastive learning, neglecting the potential complementary information from other items' multimodal data. To address this issue, we introduce a semantic-enhanced contrastive learning module, which can learn latent semantic correlations across items by a semantic-similarity weighting matrix. It can be integrated as a plug-in for other models to effectively explore latent semantics. On top of this, we provide theoretical guarantees that demonstrate the effectiveness of OTRec in aligning multimodal and ID information and in enhancing the mutual information between them. Extensive evaluations on three public datasets illustrate OTRec's effectiveness and achieve state-of-the-art performance.
关键词	Semantics Recommender systems Contrastive learning Electronic mail Lattices Data models Data mining Artificial intelligence Accuracy Visualization Multimodal recommendation optimal transport modal-invariant modal-specific
DOI	10.1109/TMM.2025.3607735
收录类别	SCI
语种	英语
WOS研究方向	Computer Science ; Telecommunications
WOS类目	Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号	WOS:001615548800007
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/43060
专题	中国科学院计算技术研究所
通讯作者	Xu, Qianqian; Huang, Qingming
作者单位	1.Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur SKLOIS, Beijing 100093, Peoples R China 2.Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100080, Peoples R China 3.Shanghai AI Lab, Shanghai 200232, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China 5.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China 6.Alibaba Grp, Secur Dept, Hangzhou 311121, Peoples R China 7.Sun Yat sen Univ, Sch Cyber Sci & Technol, Shenzhen Campus, Shenzhen 518107, Peoples R China 8.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China 9.Univ Chinese Acad Sci, Key Lab Big Data Min & Knowledge Management BDKM, Beijing 101408, Peoples R China 10.Peng Cheng Lab, Shenzhen 518055, Peoples R China
推荐引用方式 GB/T 7714	Cao, Zongsheng,Xu, Qianqian,Yang, Zhiyong,et al. OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,27:8603-8617.
APA	Cao, Zongsheng,Xu, Qianqian,Yang, Zhiyong,He, Yuan,Cao, Xiaochun,&Huang, Qingming.(2025).OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport.IEEE TRANSACTIONS ON MULTIMEDIA,27,8603-8617.
MLA	Cao, Zongsheng,et al."OTRec: Cross-Modal Learning for Multimodal Recommendation via Optimal Transport".IEEE TRANSACTIONS ON MULTIMEDIA 27(2025):8603-8617.