论文标题

带有神经传感器的流式宣传扬声器ASR

Streaming Target-Speaker ASR with Neural Transducer

论文作者

Moriya, Takafumi, Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Shinozaki, Takahiro

论文摘要

尽管深度学习技术的最新进展促进了单言行剧案例中自动语音识别(ASR)的表现,但仍然很难识别许多声音重叠的多对言语演讲。解决此问题的一种常规方法是使用ASR后端使用一系列语音分离或目标语音提取前端。但是,前端模块的额外计算成本是快速响应的关键障碍,尤其是对于流媒体而言。在本文中,我们提出了一个目标扬声器ASR(TS-ASR)系统,该系统将目标语音提取功能隐含地集成到流中端到端(E2E)ASR系统中,即复发性神经网络传播器(RNNT)。我们的系统使用与目标语音提取相似的想法,但直接将其在RNNT的编码级别上实现。这允许实现TS-ASR,而无需在前端放置额外的计算成本。请注意,这项研究提出了对E2E TS-ASR的先前研究之间的两个主要差异。我们研究流模型并基于构象模型的研究,而先前的研究使用了基于RNN的系统,仅考虑离线处理。我们在实验中确认,我们的TS-ASR在离线设置中与常规级联系统实现了可比的识别性能,同时降低了计算成本并实现流式传输TS-ASR。

Although recent advances in deep learning technology have boosted automatic speech recognition (ASR) performance in the single-talker case, it remains difficult to recognize multi-talker speech in which many voices overlap. One conventional approach to tackle this problem is to use a cascade of a speech separation or target speech extraction front-end with an ASR back-end. However, the extra computation costs of the front-end module are a critical barrier to quick response, especially for streaming ASR. In this paper, we propose a target-speaker ASR (TS-ASR) system that implicitly integrates the target speech extraction functionality within a streaming end-to-end (E2E) ASR system, i.e. recurrent neural network-transducer (RNNT). Our system uses a similar idea as adopted for target speech extraction, but implements it directly at the level of the encoder of RNNT. This allows TS-ASR to be realized without placing extra computation costs on the front-end. Note that this study presents two major differences between prior studies on E2E TS-ASR; we investigate streaming models and base our study on Conformer models, whereas prior studies used RNN-based systems and considered only offline processing. We confirm in experiments that our TS-ASR achieves comparable recognition performance with conventional cascade systems in the offline setting, while reducing computation costs and realizing streaming TS-ASR.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源