论文标题
将可解释的监督机器学习引入统计生产系统的交互式反馈循环中
Introducing explainable supervised machine learning into interactive feedback loops for statistical production system
论文作者
论文摘要
统计生产系统涵盖了从集合,集合以及数据集成到数据质量保证和传播等任务的多个步骤。尽管数据质量保证的背景是应用机器学习的最有希望的领域之一,但缺乏精选和标记的培训数据通常是一个限制因素。 集中证券数据库的统计生产系统具有欧洲中央银行收集的数据与国家央行数据质量经理进行的数据质量保证之间的交互式反馈回路。质量保证反馈循环基于一组基于规则的检查,以进行筹集例外,用户可以确认数据或纠正实际错误。 在本文中,我们使用此反馈循环中收到的信息来优化给国家中央银行提出的例外,从而提高了对这些例外的用户的使用质量和系统消耗的质量。对于这种方法,我们利用可解释的机器学习来(a)确定异常类型以及(b)优先级,以确定哪些例外更有可能需要NCB进行干预或更正。此外,我们提供了可解释的AI分类法,旨在确定项目期间出现的不同可解释的AI需求。
Statistical production systems cover multiple steps from the collection, aggregation, and integration of data to tasks like data quality assurance and dissemination. While the context of data quality assurance is one of the most promising fields for applying machine learning, the lack of curated and labeled training data is often a limiting factor. The statistical production system for the Centralised Securities Database features an interactive feedback loop between data collected by the European Central Bank and data quality assurance performed by data quality managers at National Central Banks. The quality assurance feedback loop is based on a set of rule-based checks for raising exceptions, upon which the user either confirms the data or corrects an actual error. In this paper we use the information received from this feedback loop to optimize the exceptions presented to the National Central Banks thereby improving the quality of exceptions generated and the time consumed on the system by the users authenticating those exceptions. For this approach we make use of explainable supervised machine learning to (a) identify the types of exceptions and (b) to prioritize which exceptions are more likely to require an intervention or correction by the NCBs. Furthermore, we provide an explainable AI taxonomy aiming to identify the different explainable AI needs that arose during the project.