CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Inferential and Commonsense Visual Question Generation
Bi, Chao1,2; Wang, Shuhui2,3; Li, Na4; Huang, Qingming1,2
2025
发表期刊IEEE TRANSACTIONS ON MULTIMEDIA
ISSN1520-9210
卷号27页码:7796-7809
摘要The Visual Question Generation (VQG) task generally aims to produce questions based on images in natural language. Existing studies often handle VQG as a reverse Visual Question Answering (VQA), training data-driven generators on VQA datasets. However, this solution pipeline struggles to generate high-quality questions that effectively challenge robots and humans, even by leveraging the most advanced large-scale foundational models. There are also some other VQG methods depending on elaborate and costly manual preprocessing heavily. To address these limitations, we propose a novel method with a two-module framework for automatically generating inferential visual questions that also follow commonsense. The "Scene Graph Generation" module constructs specialized scene graphs by progressively expanding connections from high-confidence nodes. This module ensures semantic consistency by aligning visual, textual, and salient features. Additionally, we incorporate external knowledge to extend abstract semantic concepts and associated facts, enriching the content of generated questions and facilitating the generated question to better follow the commonsense of human. Another module "Question Generation" utilizes the above scene graph as a foundation to search and instantiate for the question. The generated questions will match with the program templates and have diverse inferential paths. Experimental results demonstrate that our method is both effective and highly scalable. The generated questions are controllable in terms of semantic richness and difficulty, exhibiting clear inferential and commonsense properties. Furthermore, we automatically utilize our method to create a large-scale dataset, ICVQA, which includes approximately 160,000 images and 800,000 questionanswer pairs, thereby facilitating further research in VQA and visual dialogue.
关键词Visual question generation visual question answering multimodal datasets knowledge and inference Visual question generation visual question answering multimodal datasets knowledge and inference
DOI10.1109/TMM.2025.3604975
收录类别SCI
语种英语
资助项目National Key R&D Program of China[2023YFC2508704] ; National Natural Science Foundation of China[62236008] ; National Natural Science Foundation of China[62022083] ; National Natural Science Foundation of China[U21B2038] ; Fundamental Research Funds for the Central Universities ; Shandong Provincial Key Research and Development Program[2024CXPT011] ; Priority Academic Program Development of QILU Institute of Technology[QIT23NN038]
WOS研究方向Computer Science ; Telecommunications
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号WOS:001598824700008
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/41613
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Wang, Shuhui
作者单位1.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
3.Pengcheng Lab, Shenzhen 518000, Peoples R China
4.Qilu Inst Technol, Jinan 250200, Shandong, Peoples R China
推荐引用方式
GB/T 7714
Bi, Chao,Wang, Shuhui,Li, Na,et al. Inferential and Commonsense Visual Question Generation[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,27:7796-7809.
APA Bi, Chao,Wang, Shuhui,Li, Na,&Huang, Qingming.(2025).Inferential and Commonsense Visual Question Generation.IEEE TRANSACTIONS ON MULTIMEDIA,27,7796-7809.
MLA Bi, Chao,et al."Inferential and Commonsense Visual Question Generation".IEEE TRANSACTIONS ON MULTIMEDIA 27(2025):7796-7809.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Bi, Chao]的文章
[Wang, Shuhui]的文章
[Li, Na]的文章
百度学术
百度学术中相似的文章
[Bi, Chao]的文章
[Wang, Shuhui]的文章
[Li, Na]的文章
必应学术
必应学术中相似的文章
[Bi, Chao]的文章
[Wang, Shuhui]的文章
[Li, Na]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。