带有副语言隐私的设备语音身份验证

论文标题

带有副语言隐私的设备语音身份验证

On-Device Voice Authentication with Paralinguistic Privacy

论文作者

Aloufi, Ranya, Haddadi, Hamed, Boyle, David

论文摘要

利用我们的声音访问并与在线服务互动，这引起了人们对便利，隐私和安全性之间权衡的担忧。保持隐私与确保投入真实性之间的冲突通常受到共享原始数据的需要，该数据包含推断各种敏感特征所需的所有副语言信息。语音助手的用户信任服务提供商；但是，考虑到第一方“诚实但很有趣”或“半honest”威胁的出现，这种信任可能放错了位置。通过假装用户利用重播或“ DeepFake”攻击，冒险者可以通过冒险者获得进一步的安全风险。我们的目标是设计和开发一个新的基于语音输入的系统，该系统提供以下规格：本地身份验证，以减少基于用户偏好基于用户偏好的本地隐私保护的需求，从而更加灵活地集成给定的目标应用程序隐私限制，并在这些目标应用程序中实现良好的性能。关键思想是基于从用户的语音中获得的唯一识别属性来局部得出基于令牌的凭据，并在传输原始数据之前提供选择性敏感的信息过滤。我们的系统包括（i）“旁白”，并以一种耐受性检测技术来阻止重播攻击；（ii）一种灵活的隐私过滤器，允许用户选择他们喜欢的数据的隐私保护级别。该系统可以通过交叉验证验证合法用户的准确性为98.68％，并且在没有专门硬件的情况下，在CPU和单核ARM处理器上以数十毫秒的运行方式运行。我们的系统证明了根据其隐私偏好更接近用户过滤原始语音输入的可行性，同时保持其真实性。

Using our voices to access, and interact with, online services raises concerns about the trade-offs between convenience, privacy, and security. The conflict between maintaining privacy and ensuring input authenticity has often been hindered by the need to share raw data, which contains all the paralinguistic information required to infer a variety of sensitive characteristics. Users of voice assistants put their trust in service providers; however, this trust is potentially misplaced considering the emergence of first-party 'honest-but-curious' or 'semi-honest' threats. A further security risk is presented by imposters gaining access to systems by pretending to be the user leveraging replay or 'deepfake' attacks. Our objective is to design and develop a new voice input-based system that offers the following specifications: local authentication to reduce the need for sharing raw voice data, local privacy preservation based on user preferences, allowing more flexibility in integrating such a system given target applications privacy constraints, and achieving good performance in these targeted applications. The key idea is to locally derive token-based credentials based on unique-identifying attributes obtained from the user's voice and offer selective sensitive information filtering before transmitting raw data. Our system consists of (i) 'VoiceID', boosted with a liveness detection technology to thwart replay attacks; (ii) a flexible privacy filter that allows users to select the level of privacy protection they prefer for their data. The system yields 98.68% accuracy in verifying legitimate users with cross-validation and runs in tens of milliseconds on a CPU and single-core ARM processor without specialized hardware. Our system demonstrates the feasibility of filtering raw voice input closer to users, in accordance with their privacy preferences, while maintaining their authenticity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题