论文标题

扬声器键单词分类的多任务模型,用于使人类保持无人机辅助检查的循环

A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection

论文作者

Li, Yu, Parsan, Anisha, Wang, Bill, Dong, Penghao, Yao, Shanshan, Qin, Ruwen

论文摘要

音频命令是一种首选的通信媒介,可将检查人员保持在半自治无人机进行的民用基础设施检查循环中。为了了解一组异质和动态检查员的特定工作命令,必须针对该组成本开发一个模型,并在组更改时很容易适应。本文的动机是建立一个具有股票份量的架构的多任务深度学习模型。该体系结构允许两个分类任务共享功能提取器,然后通过功能投影和协作培训在提取功能中交织在一起的特定主题和关键字特定功能。一组五个授权主题的基本模型对本研究收集的检查关键字数据集进行了培训和测试。该模型在分类任何授权检查员的关键字时达到了95.3%或更高的平均准确性。它在扬声器分类中的平均准确性为99.2%。由于该模型从汇总培训数据中学习的更丰富的关键字表示,因此将基本模型调整为新检查员只需要该检查员的少量培训数据,例如每个关键字五个话语。在验证授权检查员方面,使用说话者分类分数进行检查员验证可以达到至少93.9%的成功率,而在检测未经授权的检查员方面的成功率为76.1%。此外,本文展示了所提出的模型对公共数据集上的大型组的适用性。本文为解决AI辅助人类机器人互动面临的挑战提供了解决方案,包括工人异质性,工人动态和工作异质性。

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share-Split-Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data, adapting the base model to a new inspector requires only a little training data from that inspector, like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further, the paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human-robot interaction, including worker heterogeneity, worker dynamics, and job heterogeneity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源