密封：用于系统错误分析和标签的交互式工具

论文标题

密封：用于系统错误分析和标签的交互式工具

SEAL : Interactive Tool for Systematic Error Analysis and Labeling

论文作者

Rajani, Nazneen, Liang, Weixin, Chen, Lingjiao, Mitchell, Meg, Zou, James

论文摘要

随着变形金刚的出现，大型语言模型（LLM）具有众所周知的NLP基准和高度性能的排行榜。但是，很多时候，这些模型在尾部数据或稀有组中有系统地失败。当没有明确的标签（例如种族，性别等）时，确定此类有问题的数据组更具挑战性，并且由于缺乏视觉特征来表征失败模式（例如，亚洲男性，亚洲男性，动物，室内，室内，室内，水鸟等），NLP数据集进一步更加复杂。本文介绍了一种交互式系统错误分析和标记（\密封）工具，该工具使用两步方法首先识别数据的高误差切片，然后在第二步中引入方法，以向那些表现不佳的切片提供人为理解的语义。我们探索了多种方法，用于使用语言标签的语言模型和用于生成视觉特征的文本到图像模型的语言模型为错误组提供一致的语义。密封工具包和演示屏幕截图可从https://huggingface.co/spaces/nazneen/seal获得。

With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance. However, many times these models systematically fail on tail data or rare groups not obvious in aggregate evaluation. Identifying such problematic data groups is even more challenging when there are no explicit labels (e.g., ethnicity, gender, etc.) and further compounded for NLP datasets due to the lack of visual features to characterize failure modes (e.g., Asian males, animals indoors, waterbirds on land, etc.). This paper introduces an interactive Systematic Error Analysis and Labeling (\seal) tool that uses a two-step approach to first identify high error slices of data and then, in the second step, introduce methods to give human-understandable semantics to those underperforming slices. We explore a variety of methods for coming up with coherent semantics for the error groups using language models for semantic labeling and a text-to-image model for generating visual features. SEAL toolkit and demo screencast is available at https://huggingface.co/spaces/nazneen/seal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题