论文标题

Brouhaha:语音活动检测,语音与噪声比率和C50房间声学估计的多任务培训

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

论文作者

Lavechin, Marvin, Métais, Marianne, Titeux, Hadrien, Boissonnet, Alodie, Copet, Jade, Rivière, Morgane, Bergelson, Elika, Cristia, Alejandrina, Dupoux, Emmanuel, Bredin, Hervé

论文摘要

当应用于嘈杂或回响的语音时,大多数自动语音处理系统会降级性能。但是,如何判断言语是嘈杂还是回响?我们提出了Brouhaha,该神经网络联合训练,可从单渠道录音中提取语音/非语音段,语音与噪声比例和C50房间声学。 Brouhaha是使用数据驱动的方法进行训练的,在该方法中,合成了嘈杂和回响的音频段。我们首先评估其性能,并证明所提出的多任务制度是有益的。然后,我们提出了两种情况,说明了如何将Brouhaha用于自然嘈杂和回响的数据:1)研究说话者诊断模型(Pyannote.audio)犯的错误; 2)评估自动语音识别模型的可靠性(OpenAI的耳语)。我们的管道和预估计的模型都是开源的,并与演讲社区共享。

Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech. But how can one tell whether speech is noisy or reverberant? We propose Brouhaha, a neural network jointly trained to extract speech/non-speech segments, speech-to-noise ratios, and C50room acoustics from single-channel recordings. Brouhaha is trained using a data-driven approach in which noisy and reverberant audio segments are synthesized. We first evaluate its performance and demonstrate that the proposed multi-task regime is beneficial. We then present two scenarios illustrating how Brouhaha can be used on naturally noisy and reverberant data: 1) to investigate the errors made by a speaker diarization model (pyannote.audio); and 2) to assess the reliability of an automatic speech recognition model (Whisper from OpenAI). Both our pipeline and a pretrained model are open source and shared with the speech community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源