通过基于域的自动语音识别的无人机控制

论文标题

通过基于域的自动语音识别的无人机控制

Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech Recognition

论文作者

Contreras, Ruben, Ayala, Angel, Cruz, Francisco

论文摘要

目前，无人驾驶飞机（例如无人机）正在成为我们生活的一部分，并伸向社会的许多领域，包括工业化世界。控制无人机的运动和动作的常见替代方法是通过未经触觉的接口，可以找到不同的遥控设备。但是，通过此类设备进行控制不是一种天然，类似人类的通信界面，有时很难掌握某些用户。在这项工作中，我们提出了一个基于领域的语音识别体系结构，以有效控制无人机等无人机。无人机控制是使用更自然的，类似人类的方式来传达说明的。此外，我们实施了一种使用西班牙语和英语语言来解释命令解释的算法，并在模拟的家庭环境中控制无人机的运动。进行的实验涉及参与者以两种语言向无人机发音命令，以比较每个语言的有效性，考虑到实验中参与者的母语。此外，在面对嘈杂的输入信号时，已经将不同级别的失真级应用于语音命令。获得的结果表明，无人驾驶飞机能够解释用户语音说明在使用音素匹配时，与仅使用基于云基于云的算法没有基于域的说明相比，两种语言匹配时都可以改善两种语言的语音到作用识别。使用原始音频输入，基于云的方法分别为英语和西班牙语指令获得74.81％和97.04％的准确性，而使用我们的音素匹配方法，将提高结果的结果，英语和西班牙语的精度为93.33％和100.00％。

Currently, unmanned aerial vehicles, such as drones, are becoming a part of our lives and reaching out to many areas of society, including the industrialized world. A common alternative to control the movements and actions of the drone is through unwired tactile interfaces, for which different remote control devices can be found. However, control through such devices is not a natural, human-like communication interface, which sometimes is difficult to master for some users. In this work, we present a domain-based speech recognition architecture to effectively control an unmanned aerial vehicle such as a drone. The drone control is performed using a more natural, human-like way to communicate the instructions. Moreover, we implement an algorithm for command interpretation using both Spanish and English languages, as well as to control the movements of the drone in a simulated domestic environment. The conducted experiments involve participants giving voice commands to the drone in both languages in order to compare the effectiveness of each of them, considering the mother tongue of the participants in the experiment. Additionally, different levels of distortion have been applied to the voice commands in order to test the proposed approach when facing noisy input signals. The obtained results show that the unmanned aerial vehicle is capable of interpreting user voice instructions achieving an improvement in speech-to-action recognition for both languages when using phoneme matching in comparison to only using the cloud-based algorithm without domain-based instructions. Using raw audio inputs, the cloud-based approach achieves 74.81% and 97.04% accuracy for English and Spanish instructions respectively, whereas using our phoneme matching approach the results are improved achieving 93.33% and 100.00% accuracy for English and Spanish languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题