第三次Dihard诊断挑战

论文标题

第三次Dihard诊断挑战

The Third DIHARD Diarization Challenge

论文作者

Ryant, Neville, Singh, Prachi, Krishnamohan, Venkat, Varma, Rajat, Church, Kenneth, Cieri, Christopher, Du, Jun, Ganapathy, Sriram, Liberman, Mark

论文摘要

Dihard III是一系列说话者诊断挑战中的第三个，旨在提高诊断设备，噪声状况和对话域的可变性，以提高诊断系统的鲁棒性。在两种语音活动条件下评估说话者诊断（参考语音活动与从头开始诊断的诊断）和11个不同领域进行了评估。这些领域涵盖了一系列记录条件和互动类型，包括阅读音频手册，会议语音，临床访谈，网络视频以及首次对话性电话演讲。来自行业和学术界的总共30个组织（形成21T）提交了499个有效的系统输出。评估结果表明，自Dihard I以来，说话者诊断已有明显改善，尤其是对于两党的互动，但是对于许多领域（例如，网络视频）而言，该问题尚未解决。

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain. Speaker diarization was evaluated under two speech activity conditions (diarization from a reference speech activity vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including read audio-books, meeting speech, clinical interviews, web videos, and, for the first time, conversational telephone speech. A total of 30 organizations (forming 21teams) from industry and academia submitted 499 valid system outputs. The evaluation results indicate that speaker diarization has improved markedly since DIHARD I, particularly for two-party interactions, but that for many domains (e.g., web video) the problem remains far from solved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题