论文标题

数据+偏移:支持数据科学家对数据分布变化的视觉调查

Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

论文作者

Palmeiro, João, Malveiro, Beatriz, Costa, Rita, Polido, David, Moreira, Ricardo, Bizarro, Pedro

论文摘要

数据流上的机器学习越来越多地存在于多个域中。但是,通常会有数据分配转移可以领导机器学习模型以做出错误的决策。尽管有自动方法可以检测到何时发生漂移,但人类分析通常是数据科学家,对于诊断问题的原因并调整系统至关重要。我们提出了数据+Shift,这是一种视觉分析工具,旨在支持数据科学家在欺诈检测中研究数据特征的基本因素。设计要求来自与数据科学家的访谈。数据+移位与Jupyterlab集成在一起,可以与其他数据科学工具一起使用。我们通过思考实验验证了我们的方法,数据科学家将工具用于欺诈检测用例。

Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源