CSpace
VPA: Multi-Modal Virtual Point Augmentation for 3D Object Detection
Zhong, Jianping1; Qi, Zhaobo1; Duan, Kaiwen2,3; Xu, Yuanrong1; Zhang, Weigang1; Huang, Qingming2,3,4
2025-12-01
发表期刊IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
ISSN1051-8215
卷号35期号:12页码:12410-12425
摘要Integrating LiDAR and camera data is crucial for precise 3D object detection. Existing methods resort to augmenting virtual points from 2D image space in a random manner to complete the appearance of 3D objects with sparse points. However, these augmented virtual points have unreasonable 3D positions and representations, which brings serious negative effects on accurate detection. To this end, we introduce a general 3D object detection framework called Virtual Point Augmenting (VPA) to enrich the 3D point cloud by controllably generating virtual points with accurate depth and position information as well as domain-gap-eliminated multi-modal representations from image and point cloud spaces. VPA contains two core designs, namely Hybrid Sampling Method (HSM) and Fine-Grained Cross-modal Fusion (FGCF). HSM uses the constructed seed point distribution map based on the edge score and mask score map to sample high-quality seed points, and employs a feature similarity function to sample with k neighbors' depth to obtain more accurate depth for the seed points, thereby enhancing the quality of the virtual points' 3D positions. FGCF fuses the multi-modal features, i.e., the semantic feature, the geometric feature from the image space, and the 3D position feature in an adaptive manner using self-attention mechanism, thereby further improving the representation of the virtual points. We apply VPA to the LiDAR-based method CenterPoint and fusion-based method Cross-modal transformer. Experimental results on the nuScenes, KITTI, and Waymo benchmarks validate the efficiency of our VPA, which achieves promising performance with 72.9% mAP and 74.8% NDS without using test-time augmentation and model ensemble techniques on the nuScenes test set. Code is available at https://github.com/jianpingZhonggit/vpa.git
关键词Three-dimensional displays Point cloud compression Object detection Semantics Laser radar Feature extraction Detectors Accuracy Bicycles Solids 3D object detection multimodal fusion virtual point augmenting
DOI10.1109/TCSVT.2025.3578474
收录类别SCI
语种英语
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:001631874000019
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/42986
专题中国科学院计算技术研究所
通讯作者Zhang, Weigang; Huang, Qingming
作者单位1.Harbin Inst Technol, Sch Comp Sci & Technol, Weihai 264209, Peoples R China
2.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
3.Univ Chinese Acad Sci, Key Lab Big Data Min & Knowledge Management, Beijing 100190, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Zhong, Jianping,Qi, Zhaobo,Duan, Kaiwen,et al. VPA: Multi-Modal Virtual Point Augmentation for 3D Object Detection[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2025,35(12):12410-12425.
APA Zhong, Jianping,Qi, Zhaobo,Duan, Kaiwen,Xu, Yuanrong,Zhang, Weigang,&Huang, Qingming.(2025).VPA: Multi-Modal Virtual Point Augmentation for 3D Object Detection.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,35(12),12410-12425.
MLA Zhong, Jianping,et al."VPA: Multi-Modal Virtual Point Augmentation for 3D Object Detection".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 35.12(2025):12410-12425.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhong, Jianping]的文章
[Qi, Zhaobo]的文章
[Duan, Kaiwen]的文章
百度学术
百度学术中相似的文章
[Zhong, Jianping]的文章
[Qi, Zhaobo]的文章
[Duan, Kaiwen]的文章
必应学术
必应学术中相似的文章
[Zhong, Jianping]的文章
[Qi, Zhaobo]的文章
[Duan, Kaiwen]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。