论文标题
一般特征分布的强大线性回归
Robust Linear Regression for General Feature Distribution
论文作者
论文摘要
我们研究了鲁棒的线性回归,其中可能会被遗忘的对手污染数据,即对对手的污染,而不是知道数据分布,但否则却忽略了数据样本的实现。此模型以前已在强有力的假设下进行了分析。具体而言,$ \ textbf {(i)} $所有以前的作品都假定功能的协方差矩阵是正面的; $ \ textbf {(ii)} $大多数假定功能是居中的(即零均值)。此外,所有先前的作品都做出了其他限制性假设,例如,假设特征是高斯或损坏是对称分布的。 在这项工作中,我们超越了这些假设,并在更一般的假设集中调查了可靠的回归:$ \ textbf {(i)} $我们允许协方差矩阵是积极的或积极的确定性的,$ \ textbf {(ii)} $不一定假设这些特征是$ \ textbf {iii} $ yii III} $ yii III III}特征和测量噪声的(次级次数)。 在这些假设下,我们分析了此问题的天然SGD变体,并表明当协方差矩阵是正定确定的时,它的收敛速率很快。在正确定的情况下,我们表明有两种制度:如果特征居中,我们可以获得标准的收敛速率;否则,对手会导致任何学习者任意失败。
We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary than may know the data distribution but is otherwise oblivious to the realizations of the data samples. This model has been previously analyzed under strong assumptions. Concretely, $\textbf{(i)}$ all previous works assume that the covariance matrix of the features is positive definite; and $\textbf{(ii)}$ most of them assume that the features are centered (i.e. zero mean). Additionally, all previous works make additional restrictive assumption, e.g., assuming that the features are Gaussian or that the corruptions are symmetrically distributed. In this work we go beyond these assumptions and investigate robust regression under a more general set of assumptions: $\textbf{(i)}$ we allow the covariance matrix to be either positive definite or positive semi definite, $\textbf{(ii)}$ we do not necessarily assume that the features are centered, $\textbf{(iii)}$ we make no further assumption beyond boundedness (sub-Gaussianity) of features and measurement noise. Under these assumption we analyze a natural SGD variant for this problem and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered we can obtain a standard convergence rate; otherwise the adversary can cause any learner to fail arbitrarily.