低延迟时间域多通道语音和音乐源分离

论文标题

低延迟时间域多通道语音和音乐源分离

Low Latency Time Domain Multichannel Speech and Music Source Separation

论文作者

Schuller, Gerald

论文摘要

目的是获得一个非常低延迟的简单多通道源分离。应用程序可以是电视会议，助听器，增强现实或选择性主动噪声消除。这些实时应用程序需要非常低的延迟，通常小于6 ms且复杂性低，因为它们通常在小型便携式设备上运行。为此，我们不需要最佳的分离，而是“有用”的分离，而不仅仅是在言语上，而且是音乐和噪音。通常的频域方法具有较高的延迟和复杂性。因此，我们引入了一种新型的概率优化方法，我们称之为“随机方向”，该方法可以克服局部最小值，应用于简单的时域解及结构，并且对于低复杂性是可扩展的。然后将其与频域的方法进行比较，以分离语音和音乐源，并使用3D麦克风设置。

The Goal is to obtain a simple multichannel source separation with very low latency. Applications can be teleconferencing, hearing aids, augmented reality, or selective active noise cancellation. These real time applications need a very low latency, usually less than about 6 ms, and low complexity, because they usually run on small portable devices. For that we don't need the best separation, but "useful" separation, and not just on speech, but also music and noise. Usual frequency domain approaches have higher latency and complexity. Hence we introduce a novel probabilistic optimization method which we call "Random Directions", which can overcome local minima, applied to a simple time domain unmixing structure, and which is scalable for low complexity. Then it is compared to frequency domain approaches on separating speech and music sources, and using 3D microphone setups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题