论文标题
Gini相关性在高维度与K-样本问题的应用中的渐近正态性
Asymptotic Normality of Gini Correlation in High Dimension with Applications to the K-sample Problem
论文作者
论文摘要
Dang等人提出的分类Gini相关性。是表征分类变量和数值变量之间独立性的依赖度量。当固定数值变量的维度时,已经确定了在依赖性和独立性下的样本相关性的渐近分布。但是,尚未探索其对高维数据的渐近行为。在本文中,我们在更现实的环境中开发了Gini相关性的中心限制定理,在更现实的环境中,数值变量的维度有所不同。然后,我们基于渐近态性构建了$ K $样本问题的强大而一致的测试。拟议的测试不仅避免了计算负担,而且还获得了对置换程序的权力。仿真研究和实际数据图显示,拟议的测试在广泛的现实情况下,尤其是在不平衡情况下的现有方法更具竞争力。
The categorical Gini correlation proposed by Dang et al. is a dependence measure to characterize independence between categorical and numerical variables. The asymptotic distributions of the sample correlation under dependence and independence have been established when the dimension of the numerical variable is fixed. However, its asymptotic behavior for high dimensional data has not been explored. In this paper, we develop the central limit theorem for the Gini correlation in the more realistic setting where the dimensionality of the numerical variable is diverging. We then construct a powerful and consistent test for the $K$-sample problem based on the asymptotic normality. The proposed test not only avoids computation burden but also gains power over the permutation procedure. Simulation studies and real data illustrations show that the proposed test is more competitive to existing methods across a broad range of realistic situations, especially in unbalanced cases.