论文标题

TexPrax:道德,实时数据收集和注释的消息传递应用程序

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

论文作者

Stangier, Lorenz, Lee, Ji-Ung, Wang, Yuxi, Müller, Marvin, Frick, Nicholas, Metternich, Joachim, Gurevych, Iryna

论文摘要

收集和注释以任务为导向的对话数据很困难,尤其是对于需要专家知识的高度特定领域。同时,非正式的沟通渠道(例如即时使者)在工作中越来越多地使用。这导致了许多与工作相关的信息,这些信息通过这些渠道传播,需要由员工进行后处理。为了减轻这个问题,我们提出了TexPrax,这是一种消息传递系统,以收集和注释与工作有关的聊天中发生的问题,原因和解决方案。 TexPrax使用聊天机器人直接吸引员工,以便在对话中提供轻巧的注释并简化他们的文档工作。为了遵守数据隐私和安全法规,我们使用端到端消息加密,并使用户完全控制其数据,该数据比常规注释工具具有各种优势。我们与德国工厂员工一起在用户研究中评估Texprax,他们要求同事提供有关日常工作中出现的问题的解决方案。总体而言,我们收集了202个面向任务的德语对话,其中包含1,027个句子,并带有句子级专家注释。我们的数据分析还表明,现实世界对话通常包含具有代码转换,对同一实体的缩写的实例,以及NLP系统应该能够处理的方言。

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源