论文标题
深度学习模型的城市空间感知的能力
Explainability of Deep Learning models for Urban Space perception
论文作者
论文摘要
城市规划师越来越多地使用基于深度学习的计算机视觉模型来支持塑造城市环境的决策。这样的模型预测人们如何从例如它的安全或美丽。但是,深度学习模型的黑盒本质会阻碍城市规划师了解哪些景观对象有助于特别高质量或低质量的城市空间感知。这项研究调查了如何使用计算机视觉模型来提取有关人们对城市空间的看法的相关政策信息。为此,我们训练了两个广泛使用的计算机视觉架构。卷积神经网络和变压器,并应用Gradcam(一种众所周知的可解释的AI技术),以突出图像区域对模型的预测很重要。使用这些GradCAM可视化,我们手动注释与模型的感知预测相关的对象。结果,我们能够发现以前研究中用于注释的当前对象检测模型中未表示的新对象。此外,我们的方法论结果表明,变压器架构更适合与GARGCAM技术结合使用。代码可在GitHub上找到。
Deep learning based computer vision models are increasingly used by urban planners to support decision making for shaping urban environments. Such models predict how people perceive the urban environment quality in terms of e.g. its safety or beauty. However, the blackbox nature of deep learning models hampers urban planners to understand what landscape objects contribute to a particularly high quality or low quality urban space perception. This study investigates how computer vision models can be used to extract relevant policy information about peoples' perception of the urban space. To do so, we train two widely used computer vision architectures; a Convolutional Neural Network and a transformer, and apply GradCAM -- a well-known ex-post explainable AI technique -- to highlight the image regions important for the model's prediction. Using these GradCAM visualizations, we manually annotate the objects relevant to the models' perception predictions. As a result, we are able to discover new objects that are not represented in present object detection models used for annotation in previous studies. Moreover, our methodological results suggest that transformer architectures are better suited to be used in combination with GradCAM techniques. Code is available on Github.