论文标题

累积分布函数的功能线性回归

Functional Linear Regression of Cumulative Distribution Functions

论文作者

Zhang, Qian, Makur, Anuran, Azizzadenesheli, Kamyar

论文摘要

累积分布函数(CDF)的估计是一项重要的学习任务,具有各种下游应用,例如预测和决策中的风险评估。在本文中,我们研究了上下文CDF的功能回归,其中每个数据点都是从上下文依赖的CDF基础函数的线性组合中取样的。我们提出了基于功能性脊回归方法的估计方法,该方法在任何地方都能准确估算CDF。特别是,给定具有$ d $基础功能的$ n $样品,我们显示了$ \ widetilde o(\ sqrt {d/n})$的估计错误上限,用于固定设计,随机设计和对抗性上下文。我们还得出了匹配的信息理论下限,为CDF功能回归建立了最小值的最佳性。此外,我们使用替代惩罚估计器在随机设计设置中删除燃烧时间。然后,我们考虑数据生成过程中存在不匹配的不可知设置。我们根据不匹配的错误来表征所提出的估计器的误差,并表明估计器在模型不匹配下表现得很好。此外,为了完成我们的研究,我们将参数空间是无限维度希尔伯特空间的无限尺寸模型进行了形式化,并在此设置中建立了自相应的估计误差上限。值得注意的是,当参数空间被限制为$ d $数时,上限将减少为$ \ widetilde o(\ sqrt {d/n})$。我们的综合数值实验验证了我们在合成和实际环境中的估计方法的功效。

The estimation of cumulative distribution functions (CDF) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde O(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Moreover, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish a self-normalized estimation error upper bound for this setting. Notably, the upper bound reduces to the $\widetilde O(\sqrt{d/n})$ bound when the parameter space is constrained to be $d$-dimensional. Our comprehensive numerical experiments validate the efficacy of our estimation methods in both synthetic and practical settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源