论文标题
一击对象检测所需的广泛数据集
A Broad Dataset is All You Need for One-Shot Object Detection
论文作者
论文摘要
是否可以从一个示例中检测任意对象?一击对象检测的所有现有尝试的一个核心问题是概括差距:训练过程中使用的对象类别比新颖的对象类别可靠得多。我们在这里表明,通过增加训练过程中使用的对象类别的数量,可以几乎缩小这种概括差距。这样做可以使我们能够将概括从看不见的类别从45%提高到89%,并将可可最新的可可提高到5.4%的AP50(从22.0到27.5)。我们验证效果是由类别数量而不是培训样本的数量引起的,并且它适用于不同的模型,骨干和数据集。该结果表明,强烈的少数检测模型的关键可能不在于复杂的度量学习方法,而是仅仅在缩放类别的数量上。我们希望我们的发现将有助于更好地了解几乎没有学习的挑战,并鼓励未来的数据注释工作,以专注于更广泛类别的更广泛的数据集,而不是每类收集更多样本。
Is it possible to detect arbitrary objects from a single example? A central problem of all existing attempts at one-shot object detection is the generalization gap: Object categories used during training are detected much more reliably than novel ones. We here show that this generalization gap can be nearly closed by increasing the number of object categories used during training. Doing so allows us to improve generalization from seen to unseen classes from 45% to 89% and improve the state-of-the-art on COCO by 5.4 %AP50 (from 22.0 to 27.5). We verify that the effect is caused by the number of categories and not the number of training samples, and that it holds for different models, backbones and datasets. This result suggests that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead simply in scaling the number of categories. We hope that our findings will help to better understand the challenges of few-shot learning and encourage future data annotation efforts to focus on wider datasets with a broader set of categories rather than gathering more samples per category.