论文标题
探索不存在的东西 - 基于字形的可视化以分析缺失值
To Explore What Isn't There -- Glyph-based Visualization for Analysis of Missing Values
论文作者
论文摘要
本文贡献了一种新颖的可视化方法,即缺失字形,以分析和探索数据中缺失值。在大多数数据生成域中,缺失值是一个普遍的挑战,可能会导致一系列分析问题。数据的缺失可能表明数据收集和预处理中的潜在问题,或突出显示重要的数据特征。虽然处理缺失数据的统计方法的开发和改进本身就是一个研究领域,主要集中于用估计值替换缺失值,但对丢失值的可视化的重点大大减少。但是,可视化和探索性分析具有支持对数据缺失的理解,并能够以无法统计方法无法做到的方式获得对缺失模式的新见解。缺失的字形支持识别数据中相关的缺失模式,并在缺失模式的背景下进行评估并与另外两种可视化方法进行了比较。结果是有希望的,并确认丢失的字形在几种情况下的表现要比替代可视化方法更好。
This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development and improvement of statistical methods for dealing with missing data is a research area in its own right, mainly focussing on replacing missing values with estimated values, considerably less focus has been put on visualization of missing values. Nonetheless, visualization and explorative analysis has great potential to support understanding of missingness in data, and to enable gaining of novel insights into patterns of missingness in a way that statistical methods are unable to. The Missingness Glyph supports identification of relevant missingness patterns in data, and is evaluated and compared to two other visualization methods in context of the missingness patterns. The results are promising and confirms that the Missingness Glyph in several cases perform better than the alternative visualization methods.