论文标题
个性化的聊天机器人可信度等级
Personalized Chatbot Trustworthiness Ratings
论文作者
论文摘要
对话代理通常被称为聊天机器人,越来越多地部署在许多域中,以使人们在尝试解决特定问题的同时具有自然的互动。鉴于它们的广泛使用,重要的是为其用户提供方法和工具,以提高用户对聊天机器人各种属性的认识,包括用户可能认为重要的非功能性属性,以信任特定的聊天机器人。例如,用户可能想使用没有偏见的聊天机器人,不使用滥用语言,不会向其他用户泄漏信息,并且以适合用户的认知水平的样式做出响应。 在本文中,我们解决了无法修改聊天机器人,无法访问其培训数据的设置,但是中立方希望评估和传达其可信度,并将其可信度传达给用户对用户在各种信任问题上的优先级的量身定制。这样的评分可以帮助用户在替代聊天机器人中进行选择,开发人员测试其系统,业务领导者为其产品定价,并制定了监管机构制定政策。我们设想了一种针对聊天机器人的个性化评级方法,该方法依赖于每个问题的单独评级模块,以及用户在相关信任问题之间检测到的优先顺序,以生成汇总的个性化评级,以实现聊天机器人的可信度。该方法独立于特定的信任问题,并且是集合过程的参数,从而允许无缝概括。我们说明了它的一般用途,将其与实时聊天机器人集成在一起,并在四个对话框数据集和用户调查验证的代表用户配置文件中对其进行评估。
Conversation agents, commonly referred to as chatbots, are increasingly deployed in many domains to allow people to have a natural interaction while trying to solve a specific problem. Given their widespread use, it is important to provide their users with methods and tools to increase users awareness of various properties of the chatbots, including non-functional properties that users may consider important in order to trust a specific chatbot. For example, users may want to use chatbots that are not biased, that do not use abusive language, that do not leak information to other users, and that respond in a style which is appropriate for the user's cognitive level. In this paper, we address the setting where a chatbot cannot be modified, its training data cannot be accessed, and yet a neutral party wants to assess and communicate its trustworthiness to a user, tailored to the user's priorities over the various trust issues. Such a rating can help users choose among alternative chatbots, developers test their systems, business leaders price their offering, and regulators set policies. We envision a personalized rating methodology for chatbots that relies on separate rating modules for each issue, and users' detected priority orderings among the relevant trust issues, to generate an aggregate personalized rating for the trustworthiness of a chatbot. The method is independent of the specific trust issues and is parametric to the aggregation procedure, thereby allowing for seamless generalization. We illustrate its general use, integrate it with a live chatbot, and evaluate it on four dialog datasets and representative user profiles, validated with user surveys.