论文标题

高效的实时流媒体流和完全的设备扬声器诊断以及多阶段聚类

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

论文作者

Wang, Quan, Huang, Yiling, Lu, Han, Zhao, Guanlong, Moreno, Ignacio Lopez

论文摘要

尽管最新的研究进展的说话者诊断主要集中在提高诊断质量的结果上,但对提高诊断系统效率的兴趣也越来越浓厚。在本文中,我们证明了使用不同长度输入的多个阶段聚类策略,可以解决对设备扬声器诊断应用程序的多方面挑战。具体而言,后备簇用于处理短形式输入。主簇用于处理中长度输入;并且簇前用于压缩长形式输入,然后再由主簇处理。主层层层和簇群都可以使用计算复杂性的上限进行配置,以适应具有不同资源约束的设备。这种多阶段的聚类策略对于流媒体扬声器诊断系统至关重要,在该系统中,CPU,内存和电池的预算很紧。

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems. In this paper, we demonstrate that a multi-stage clustering strategy that uses different clustering algorithms for input of different lengths can address multi-faceted challenges of on-device speaker diarization applications. Specifically, a fallback clusterer is used to handle short-form inputs; a main clusterer is used to handle medium-length inputs; and a pre-clusterer is used to compress long-form inputs before they are processed by the main clusterer. Both the main clusterer and the pre-clusterer can be configured with an upper bound of the computational complexity to adapt to devices with different resource constraints. This multi-stage clustering strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源