基于深度学习的阶段二维扬声器定位，具有大型临时麦克风阵列

论文标题

基于深度学习的阶段二维扬声器定位，具有大型临时麦克风阵列

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

论文作者

Liu, Shupei, Feng, Linfeng, Gong, Yijun, Liang, Chengdong, Zhang, Chen, Zhang, Xiao-Lei, Li, Xuelong

论文摘要

虽然基于深度学习的扬声器本地化在具有挑战性的声学环境中显示出优势，但它通常仅产生到达方向（DOA）提示，而不是精确的二维（2D）坐标。为了解决这个问题，我们提出了一种新型的基于深度学习的2D扬声器定位方法，利用了临时麦克风阵列，其中临时麦克风阵列由随机分布的麦克风节点组成，每个节点都配备了传统阵列。具体而言，我们首先在每个节点上使用卷积神经网络来估算说话者的方向。然后，我们使用三角剖分和聚类技术将这些DOA估计值集成，以获取2D扬声器位置。为了进一步提高估计精度，我们引入了一种节点选择算法，该算法从策略性地过滤了最可靠的节点。对模拟和现实世界数据的广泛实验表明，我们的方法显着优于常规方法。提出的节点选择进一步完善了性能。实验中的实际数据集，名为Libri-Adhoc-Node10，该数据是本文首次描述的新记录的数据，可在线访问https://github.com/liu-sp/libib-sp/libri-adhoc-nodes10。

While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题