论文标题
SPARC:单个RGB图像中CAD模型对齐的稀疏渲染和能力
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image
论文作者
论文摘要
从单个图像中估算静态对象的3D形状和姿势具有重要的应用程序,可用于机器人技术,增强现实和数字内容创建。通常,这是通过直接的网格预测来完成的,该预测会产生不现实的,过度缝制的形状,或通过将形状预测作为检索任务制定,然后是CAD模型对齐。直接预测来自2D图像特征的CAD模型姿势是困难且不准确的。一些作品,例如ROCA,会回归归一化对象坐标,并将其用于计算姿势。尽管这可以产生更准确的姿势估计,但预测归一化对象坐标易受系统故障的影响。利用有效的变压器体系结构,我们证明了一种稀疏,迭代,渲染和能力的方法比依靠归一化对象坐标更准确,更健壮。为此,我们结合了2D图像信息,包括稀疏深度和表面正常值,我们在早期融合中使用3D CAD模型信息直接从图像估算。特别是,我们以最初的随机姿势从CAD模型中进行了重新投影点,并计算其深度和表面正常值。这些组合的信息是姿势预测网络SPARC-NET的输入,我们训练它以预测9 DOF CAD模型姿势更新。再次重新投影CAD模型,并预测下一个姿势更新。我们的对齐过程仅在3次迭代后收敛,从而提高了具有挑战性的现实数据集扫描仪的最先进性能从25.0%到31.8%的实例对准精度。代码将在https://github.com/florianlanger/sparc上发布。
Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation. Often this is done through direct mesh predictions which produces unrealistic, overly tessellated shapes or by formulating shape prediction as a retrieval task followed by CAD model alignment. Directly predicting CAD model poses from 2D image features is difficult and inaccurate. Some works, such as ROCA, regress normalised object coordinates and use those for computing poses. While this can produce more accurate pose estimates, predicting normalised object coordinates is susceptible to systematic failure. Leveraging efficient transformer architectures we demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates. For this we combine 2D image information including sparse depth and surface normal values which we estimate directly from the image with 3D CAD model information in early fusion. In particular, we reproject points sampled from the CAD model in an initial, random pose and compute their depth and surface normal values. This combined information is the input to a pose prediction network, SPARC-Net which we train to predict a 9 DoF CAD model pose update. The CAD model is reprojected again and the next pose update is predicted. Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet from 25.0% to 31.8% instance alignment accuracy. Code will be released at https://github.com/florianlanger/SPARC .