Institute of Computing Technology, Chinese Academy IR
| Inferential and Commonsense Visual Question Generation | |
| Bi, Chao1,2; Wang, Shuhui2,3; Li, Na4; Huang, Qingming1,2 | |
| 2025 | |
| 发表期刊 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
| ISSN | 1520-9210 |
| 卷号 | 27页码:7796-7809 |
| 摘要 | The Visual Question Generation (VQG) task generally aims to produce questions based on images in natural language. Existing studies often handle VQG as a reverse Visual Question Answering (VQA), training data-driven generators on VQA datasets. However, this solution pipeline struggles to generate high-quality questions that effectively challenge robots and humans, even by leveraging the most advanced large-scale foundational models. There are also some other VQG methods depending on elaborate and costly manual preprocessing heavily. To address these limitations, we propose a novel method with a two-module framework for automatically generating inferential visual questions that also follow commonsense. The "Scene Graph Generation" module constructs specialized scene graphs by progressively expanding connections from high-confidence nodes. This module ensures semantic consistency by aligning visual, textual, and salient features. Additionally, we incorporate external knowledge to extend abstract semantic concepts and associated facts, enriching the content of generated questions and facilitating the generated question to better follow the commonsense of human. Another module "Question Generation" utilizes the above scene graph as a foundation to search and instantiate for the question. The generated questions will match with the program templates and have diverse inferential paths. Experimental results demonstrate that our method is both effective and highly scalable. The generated questions are controllable in terms of semantic richness and difficulty, exhibiting clear inferential and commonsense properties. Furthermore, we automatically utilize our method to create a large-scale dataset, ICVQA, which includes approximately 160,000 images and 800,000 questionanswer pairs, thereby facilitating further research in VQA and visual dialogue. |
| 关键词 | Visual question generation visual question answering multimodal datasets knowledge and inference Visual question generation visual question answering multimodal datasets knowledge and inference |
| DOI | 10.1109/TMM.2025.3604975 |
| 收录类别 | SCI |
| 语种 | 英语 |
| 资助项目 | National Key R&D Program of China[2023YFC2508704] ; National Natural Science Foundation of China[62236008] ; National Natural Science Foundation of China[62022083] ; National Natural Science Foundation of China[U21B2038] ; Fundamental Research Funds for the Central Universities ; Shandong Provincial Key Research and Development Program[2024CXPT011] ; Priority Academic Program Development of QILU Institute of Technology[QIT23NN038] |
| WOS研究方向 | Computer Science ; Telecommunications |
| WOS类目 | Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications |
| WOS记录号 | WOS:001598824700008 |
| 出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/41613 |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Wang, Shuhui |
| 作者单位 | 1.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China 3.Pengcheng Lab, Shenzhen 518000, Peoples R China 4.Qilu Inst Technol, Jinan 250200, Shandong, Peoples R China |
| 推荐引用方式 GB/T 7714 | Bi, Chao,Wang, Shuhui,Li, Na,et al. Inferential and Commonsense Visual Question Generation[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,27:7796-7809. |
| APA | Bi, Chao,Wang, Shuhui,Li, Na,&Huang, Qingming.(2025).Inferential and Commonsense Visual Question Generation.IEEE TRANSACTIONS ON MULTIMEDIA,27,7796-7809. |
| MLA | Bi, Chao,et al."Inferential and Commonsense Visual Question Generation".IEEE TRANSACTIONS ON MULTIMEDIA 27(2025):7796-7809. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论