论文标题
将分层作为随机的下二次优化问题
Tiering as a Stochastic Submodular Optimization Problem
论文作者
论文摘要
分层是构建大规模信息检索系统的必不可少的技术。虽然针对高优先级层次的文档选择对分层的效率产生了严格的影响,但过去的工作着重于对历史上一系列查询的优化优化,并且对未来的流量概括了。取而代之的是,我们将最佳分层作为随机优化问题提出,并遵循正规化经验风险最小化的方法,以最大化系统的\ emph {概括性能}。我们还表明,优化问题可以用作随机的下二次优化问题,并具有子模具knapsack约束,并且我们通过利用此连接来开发有效的优化算法。
Tiering is an essential technique for building large-scale information retrieval systems. While the selection of documents for high priority tiers critically impacts the efficiency of tiering, past work focuses on optimizing it with respect to a static set of queries in the history, and generalizes poorly to the future traffic. Instead, we formulate the optimal tiering as a stochastic optimization problem, and follow the methodology of regularized empirical risk minimization to maximize the \emph{generalization performance} of the system. We also show that the optimization problem can be cast as a stochastic submodular optimization problem with a submodular knapsack constraint, and we develop efficient optimization algorithms by leveraging this connection.