Placepedia：具有多面注释的全面理解

论文标题

Placepedia：具有多面注释的全面理解

Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations

论文作者

Huang, Huaiyi, Zhang, Yuqi, Huang, Qingqiu, Guo, Zhengkui, Liu, Ziwei, Lin, Dahua

论文摘要

位置是视觉理解中的重要因素。鉴于建筑物的照片，人们通常可以说出其功能，例如餐厅或商店，其文化风格，例如亚洲或欧洲以及其经济类型，例如面向行业或以旅游为导向。尽管在以前的工作中已经广泛研究了位置识别，但对于全面的理解，还有很长的路要走，这远远超出了将图像分类的远远不足以分类的位置，并且需要多个方面的信息。在这项工作中，我们贡献了Placepedia，这是一个大规模的地方数据集，拥有超过3500万张照片的240k独特场所的照片。除了照片外，每个地方还带有大量的多面信息，例如GDP，人口等多个级别的标签，包括功能，城市，国家等。该数据集及其大量的数据和丰富的注释允许进行各种研究。特别是，在我们的研究中，我们开发了1）PlaceNet，这是一个多层次识别的统一框架，以及2）一种城市嵌入的方法，该方法可以为一个捕获视觉和多面信息的城市产生矢量表示。这些研究不仅揭示了理解的关键挑战，而且还建立了视觉观察与潜在的社会经济/文化意义之间的联系。

Place is an important element in visual understanding. Given a photo of a building, people can often tell its functionality, e.g. a restaurant or a shop, its cultural style, e.g. Asian or European, as well as its economic type, e.g. industry oriented or tourism oriented. While place recognition has been widely studied in previous work, there remains a long way towards comprehensive place understanding, which is far beyond categorizing a place with an image and requires information of multiple aspects. In this work, we contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places. Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc., and labels at multiple levels, including function, city, country, etc.. This dataset, with its large amount of data and rich annotations, allows various studies to be conducted. Particularly, in our studies, we develop 1) PlaceNet, a unified framework for multi-level place recognition, and 2) a method for city embedding, which can produce a vector representation for a city that captures both visual and multi-faceted side information. Such studies not only reveal key challenges in place understanding, but also establish connections between visual observations and underlying socioeconomic/cultural implications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题