论文标题

通过连续的多模式推断,在模棱两可的场景中进行6D摄像机重新定位

6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference

论文作者

Bui, Mai, Birdal, Tolga, Deng, Haowen, Albarqouni, Shadi, Guibas, Leonidas, Ilic, Slobodan, Navab, Nassir

论文摘要

我们提出了一个多模式摄像机重新定位框架,该框架通过在摄像头姿势的歧管上定义的连续混合模型来捕获歧义和不确定性。在高度模棱两可的环境中,由于场景中的对称性和重复结构而很容易出现,计算一个合理的解决方案(当前大多数最新方法回归)可能不够。取而代之的是,我们预测每个预测的多个摄像头构成假设以及各自的不确定性。为了实现这一目标,我们使用宾厄姆分布,对摄像头姿势的方向进行建模,并使用多元高斯人使用端到端的深神经网络来对位置进行建模。通过结合赢家训练方案,我们最终获得了一个非常适合解释现场歧义的混合模型,但并没有遭受模式崩溃的困扰,这是混合密度网络的常见问题。我们介绍了一个专门设计的新数据集,该数据集是为了在模棱两可的环境中培养摄像机的本地化研究,并详尽地评估了我们的合成方式以及模棱两可的场景和非歧义基准数据集中的真实数据。我们计划在$ \ href {https://multimodal3dvision.github.io}下发布代码和数据集{MultiModal3Dvision.github.io} $。

We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses. In highly ambiguous environments, which can easily arise due to symmetries and repetitive structures in the scene, computing one plausible solution (what most state-of-the-art methods currently regress) may not be sufficient. Instead we predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. Towards this aim, we use Bingham distributions, to model the orientation of the camera pose, and a multivariate Gaussian to model the position, with an end-to-end deep neural network. By incorporating a Winner-Takes-All training scheme, we finally obtain a mixture model that is well suited for explaining ambiguities in the scene, yet does not suffer from mode collapse, a common problem with mixture density networks. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments and exhaustively evaluate our method on synthetic as well as real data on both ambiguous scenes and on non-ambiguous benchmark datasets. We plan to release our code and dataset under $\href{https://multimodal3dvision.github.io}{multimodal3dvision.github.io}$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源