论文标题

QUDA:视觉数据分析的自然语言查询

Quda: Natural Language Queries for Visual Data Analytics

论文作者

Fu, Siwei, Xiong, Kai, Ge, Xiaodong, Tang, Siliang, Chen, Wei, Wu, Yingcai

论文摘要

从自由文本中识别分析任务对于面向可视化的自然语言界面(V-NLIS)至关重要,以提出有效的可视化。但是,由于人类语言的歧义和复杂性,这是具有挑战性的。为了应对这一挑战,我们提出了一个名为QUDA的新数据集,该数据集旨在通过培训和评估尖端的多标签分类模型来帮助V-NLIS识别自由形式自然语言的分析任务。我们的数据集包含$ 14,035 $多样性的用户查询,并且每个都有一个或多个分析任务注释。我们通过首先将种子查询与数据分析师收集种子查询,然后采用大量的人群来实现这一目标来实现这一目标。我们通过三个应用程序证明了QUDA的有用性。这项工作是构建用于识别分析任务的大规模语料库的首次尝试。随着QUDA的发布,我们希望它将在数据分析和可视化中促进V-NLI的研究和开发。

The identification of analytic tasks from free text is critical for visualization-oriented natural language interfaces (V-NLIs) to suggest effective visualizations. However, it is challenging due to the ambiguity and complexity nature of human language. To address this challenge, we present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks. We achieve this goal by first gathering seed queries with data analysts and then employing extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda through three applications. This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks. With the release of Quda, we hope it will boost the research and development of V-NLIs in data analysis and visualization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源