生成的部分视觉诱导对象聚类

论文标题

生成的部分视觉诱导对象聚类

Generative Partial Visual-Tactile Fused Object Clustering

论文作者

Zhang, Tao, Cong, Yang, Sun, Gan, Dong, Jiahua, Liu, Yuyang, Ding, Zhengming

论文摘要

视觉效果融合对象聚类的传感最近取得了重大进展，因为触觉方式的参与可以有效地改善聚类性能。但是，丢失的数据（即部分数据）问题始终是由于数据收集过程中的遮挡和噪声而发生的。对于异质方式挑战，大多数现有的部分多视图聚类方法无法很好地解决这个问题。天真地采用这些方法不可避免地会引起负面影响并进一步损害表现。为了解决上述挑战，我们提出了一个生成的部分视觉触觉融合（即GPVTF）框架，以用于对象聚类。更具体地说，我们首先分别从部分视觉和触觉数据中提取部分视觉和触觉特征，并在特定于模态特征子空间中编码提取的特征。然后开发有条件的跨模式聚类生成对抗网络，以在另一种方式上综合一种模态条件，该模态可以补偿缺失的样品并通过对抗学习自然地对准视觉和触觉方式。最后，采用了两个基于伪标签的KL-Divergence损失来更新相应的模态特异性编码器。在三个公共视觉效果数据集上进行的广泛比较实验证明了我们方法的有效性。

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i.e., partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题