论文标题
基于回归的解释性离散数据的插补
Regression-based imputation of explanatory discrete missing data
论文作者
论文摘要
缺失值的插补是在测量过程中处理非响应或数据丢失的策略,这可能比忽略它们更有效。当变量代表计数时,解决这个问题的文献很少。同样,如果观察到过度分散或不足的问题,建议进行泊松分布的概括以进行插补。为了评估与经典计数模型相比,在离散变量的插补中的各种回归模型的性能,这项工作提出了一项全面的模拟研究,考虑了各种情况和真实数据。为此,我们比较了仅使用完整数据的估计结果,并使用基于Poisson,负二项式,Hermite和Compoisson分布以及ZIP和ZINB模型的归档进行了归纳。这项工作的结果表明,在任何分散场景中,综合分布总体上提供了更好的结果,尤其是当丢失的信息量较大时。当变量呈现缺失值是计数时,最广泛使用的方法是假设经典的泊松模型是估算缺失计数的最佳选择。但是,在现实生活研究中,这个假设并不总是正确的,并且通常发现表现出过度分散或分散不足的计数变量,为此,Poisson模型不再是插补的最佳用途。在几种方案中,分析的方法的性能有所不同,这表明在决定使用插补方法之前,分析分散和可能存在多余的零。 COMPOISSON模型的性能很好,因为它在具有过度和不足的特征以及等分分散的特征的计数方面具有灵活性。
Imputation of missing values is a strategy for handling non-responses in surveys or data loss in measurement processes, which may be more effective than ignoring them. When the variable represents a count, the literature dealing with this issue is scarce. Likewise, if problems of over- or under-dispersion are observed, generalisations of the Poisson distribution are recommended for carrying out imputation. In order to assess the performance of various regression models in the imputation of a discrete variable compared to classical counting models, this work presents a comprehensive simulation study considering a variety of scenarios and real data. To do so we compared the results of estimations using only complete data, and using imputations based on the Poisson, negative binomial, Hermite, and COMPoisson distributions, and the ZIP and ZINB models for excesses of zeros. The results of this work reveal that the COMPoisson distribution provides in general better results in any dispersion scenario, especially when the amount of missing information is large. When the variable presenting missing values is a count, the most widely used method is to assume that a classical Poisson model is the best alternative to impute the missing counts; however, in real-life research this assumption is not always correct, and it is common to find count variables exhibiting overdispersion or underdispersion, for which the Poisson model is no longer the best to use in imputation. In several of the scenarios considered the performance of the methods analysed differs, something which indicates that it is important to analyse dispersion and the possible presence of excess zeros before deciding on the imputation method to use. The COMPoisson model performs well as it is flexible regarding the handling of counts with characteristics of over- and under-dispersion, as well as with equidispersion.