Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition

doi:10.1109/TAFFC.2025.3599859

CSpace

	Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition
	Liu, Yuanyuan 1; Wei, Lin 1; Liu, Kejun 1; Chen, Zijing 2; Chen, Zhe 3; Tang, Chang 1; Chen, Jingying 4; Shan, Shiguang 5,6
	2025-10-01
发表期刊	IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
ISSN	1949-3045
卷号	16 期号:4 页码:3404-3420
摘要	Video-based facial expression recognition (VFER) is challenging due to variations caused by cultural background and expression camouflage. To tackle these problems, researchers introduced eye movement signals to complement visual information. However, existing methods either require expensive devices to capture high-quality eye movements or can only extract low-quality eye movements visually, making them ineffective in the real world. To address this, we propose an eye movement-instructed VFER (EM-VFER) that leverages high-quality eye movements to instruct the visual learning, obtaining robust performance without requiring costly devices during inference. Specifically, our EM-VFER operates in two stages: the high-quality eye movement pre-training stage and the eye movement-instructed video fine-tuning stage. In the pre-training, we compile an Eye-behavior-aided Multimodal Emotion Recognition (EMER) dataset and use it to train a multimodal Transformer. During the fine-tuning, we propose a novel progressive eye movement-instructed learning to take better advantage of the prior knowledge about high-quality eye movement signals from EMER. The instructed fine-tuning model could then make more robust predictions on downstream facial expression datasets. We evaluate our approach on three macro-expression datasets (DFEW, MAFW and Aff-wild2) and two micro-expression datasets (CASME III and CASME II). The results demonstrate that EM-VFER significantly outperforms existing methods.
关键词	Videos Face recognition Visualization Emotion recognition Transformers Training Accuracy Data mining Gaze tracking Computational modeling Video-based facial expression recognition eye movement signals pre-training fine-tuning instructed learning
DOI	10.1109/TAFFC.2025.3599859
收录类别	SCI
语种	英语
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Cybernetics
WOS记录号	WOS:001626710800006
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型	期刊论文
条目标识符	http://119.78.100.204/handle/2XEOYT63/42816
专题	中国科学院计算技术研究所
通讯作者	Chen, Zhe; Chen, Jingying; Shan, Shiguang
作者单位	1.China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China 2.La Trobe Univ, Cisco La Trobe Ctr Artificial Intelligence & Inter, Sch Comp Engn & Math Sci, Flora Hill, Vic 3550, Australia 3.La Trobe Univ, Cisco La Trobe Ctr Artificial Intelligence & Inter, Australian Ctr Artificial Intelligence Med Innovat, Sch Comp Engn & Math Sci, Flora Hill, Vic 3550, Australia 4.Cent China Normal Univ, Natl Engn Res Ctr E Learning, Natl Engn Lab Educ Big Data, Wuhan 430079, Peoples R China 5.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China 6.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式 GB/T 7714	Liu, Yuanyuan,Wei, Lin,Liu, Kejun,et al. Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING,2025,16(4):3404-3420.
APA	Liu, Yuanyuan.,Wei, Lin.,Liu, Kejun.,Chen, Zijing.,Chen, Zhe.,...&Shan, Shiguang.(2025).Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition.IEEE TRANSACTIONS ON AFFECTIVE COMPUTING,16(4),3404-3420.
MLA	Liu, Yuanyuan,et al."Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition".IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 16.4(2025):3404-3420.