论文标题
图片:纤细的弱监督视力变压器,用于路面遇险分类
PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification
论文作者
论文摘要
自动路面遇险分类有助于提高路面维护的效率并降低劳动力和资源的成本。该任务的最近有影响力的分支将路面图像划分为补丁,并从多实体学习的角度解决了这些问题。但是,这些方法忽略了斑块之间的相关性,并且在模型优化和推理中遇到了低效率。同时,Swin Transformer能够以其独特的优势来解决这两个问题。我们构建了Swin Transformer,我们提供了一个名为\ textbf {p} avement \ textbf {i} mage \ textbf {c} lassification \ textbf {t} ransformer(\ textbf {pict pict})的视觉变压器。为了更好地利用贴片级别的路面图像的区分信息,提出了\ textit {补丁标记老师},以利用教师模型在每次迭代过程中从图像标签中动态生成贴片的伪标签,并指导该模型以学习补丁的歧视性特征。 SWIN变压器的广泛分类头可能会稀释特征聚合步骤中遇险斑块的判别特征,因为路面图像的遇险面积较小。为了克服这一缺点,我们提出一个\ textit {Patch Refiner}将补丁聚集到不同的组中,仅选择最高的遇险风险组来产生最终图像分类的纤细头。我们在CQU-BPDD上评估了我们的方法。广泛的结果表明,\ textbf {pict}在检测任务中,P@r中的$+2.4 \%$ $+2.4 \%$的大幅度优于第二好的模型,$+3.9 \%\%$ f1 $ f1 $ f1 $ in识别任务和1.8倍吞吐量,同时使用相同的计算资源享受7倍的训练速度。我们的代码和模型已在\ href {https://github.com/dearcaat/pict} {https://github.com/dearcaat/pict}上发布。
Automatic pavement distress classification facilitates improving the efficiency of pavement maintenance and reducing the cost of labor and resources. A recently influential branch of this task divides the pavement image into patches and addresses these issues from the perspective of multi-instance learning. However, these methods neglect the correlation between patches and suffer from a low efficiency in the model optimization and inference. Meanwhile, Swin Transformer is able to address both of these issues with its unique strengths. Built upon Swin Transformer, we present a vision Transformer named \textbf{P}avement \textbf{I}mage \textbf{C}lassification \textbf{T}ransformer (\textbf{PicT}) for pavement distress classification. In order to better exploit the discriminative information of pavement images at the patch level, the \textit{Patch Labeling Teacher} is proposed to leverage a teacher model to dynamically generate pseudo labels of patches from image labels during each iteration, and guides the model to learn the discriminative features of patches. The broad classification head of Swin Transformer may dilute the discriminative features of distressed patches in the feature aggregation step due to the small distressed area ratio of the pavement image. To overcome this drawback, we present a \textit{Patch Refiner} to cluster patches into different groups and only select the highest distress-risk group to yield a slim head for the final image classification. We evaluate our method on CQU-BPDD. Extensive results show that \textbf{PicT} outperforms the second-best performed model by a large margin of $+2.4\%$ in P@R on detection task, $+3.9\%$ in $F1$ on recognition task, and 1.8x throughput, while enjoying 7x faster training speed using the same computing resources. Our codes and models have been released on \href{https://github.com/DearCaat/PicT}{https://github.com/DearCaat/PicT}.