Institute of Computing Technology, Chinese Academy IR
| Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping | |
| Xu, Kai1,2; Wang, Lichun1; Li, Shuang3; Xin, Jianjia4; Yin, Baocai1 | |
| 2025-10-13 | |
| 发表期刊 | MULTIMEDIA SYSTEMS
![]() |
| ISSN | 0942-4962 |
| 卷号 | 31期号:6页码:12 |
| 摘要 | The performance of robots on the language-conditioned robotic grasping task reflects the intelligence level of robots. However, existing approaches lack the ability to handle implicit instructions and identify infeasible ones, which undermines the intelligence and operational safety of the robot. To overcome the above limitations, this paper introduces a novel Language-conditioned Robotic Grasping Dataset (LRGD), which covers a variety of instruction types. Correspondingly, an end-to-end BLIP-based Cross-modal Grasping Network (BCGN) for language-conditioned grasping is proposed. Specifically, BCGN integrates BLIP to jointly model cross-modal information, and introduces a learnable circuit breaker that enables the model to actively reject infeasible requests. Furthermore, through collaboration with LVLMs (Large Vision-Language Models), BCGN can easily achieve zero-shot recognition of implicit instructions. Experimental results the LRGD and in real-world scenarios demonstrate the effectiveness of BCGN in dealing with instructions of different complexity levels. |
| 关键词 | Robotic grasp Language-conditioned grasping Grasping dataset Cross-modal fusion |
| DOI | 10.1007/s00530-025-02005-y |
| 收录类别 | SCI |
| 语种 | 英语 |
| 资助项目 | National Natural Science Foundation of China[2021ZD0111902] ; National Key R&D Program of China[62376014] ; National Key R&D Program of China[62172022] ; National Key R&D Program of China[U21B2038] ; National Natural Science Foundation of China[2021JQR023] ; Foundation for China University Industry-University-Research Innovation[KM202411232017] ; R&D Program of Beijing Municipal Education Commission |
| WOS研究方向 | Computer Science |
| WOS类目 | Computer Science, Information Systems ; Computer Science, Theory & Methods |
| WOS记录号 | WOS:001592913400013 |
| 出版者 | SPRINGER |
| 引用统计 | |
| 文献类型 | 期刊论文 |
| 条目标识符 | http://119.78.100.204/handle/2XEOYT63/41657 |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Xu, Kai; Wang, Lichun |
| 作者单位 | 1.Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100124, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 3.Beijing Informat Sci & Technol Univ, Sch Automat, Beijing 100192, Peoples R China 4.INSPUR Grp CO LTD, Jinan 250101, Peoples R China |
| 推荐引用方式 GB/T 7714 | Xu, Kai,Wang, Lichun,Li, Shuang,et al. Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping[J]. MULTIMEDIA SYSTEMS,2025,31(6):12. |
| APA | Xu, Kai,Wang, Lichun,Li, Shuang,Xin, Jianjia,&Yin, Baocai.(2025).Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping.MULTIMEDIA SYSTEMS,31(6),12. |
| MLA | Xu, Kai,et al."Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping".MULTIMEDIA SYSTEMS 31.6(2025):12. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论