论文标题
双Higgs玻色子生产的背景建模:密度比和最佳运输
Background Modeling for Double Higgs Boson Production: Density Ratios and Optimal Transport
论文作者
论文摘要
我们研究了数据驱动的背景估计问题,这是在搜索大型强子对撞机标准模型预测的物理信号时产生的。我们的工作是由于寻找成对的希格斯玻色子腐烂成四个底部夸克的动机。许多其他物理过程(称为背景)也共享相同的最终状态。因此,在此问题中产生的数据是未标记的背景和信号事件的混合物,分析的主要目的是确定未标记信号事件的比例是否为非零。一个具有挑战性但必要的第一步是估计背景事件的分布。过去在这一领域的工作确定了信号不太可能出现的对撞机事件空间的区域,因此可以识别背景分布。可以在这些区域估算背景分布,并使用多元分类器转移学习将其推断到主要兴趣区域。我们以两种方式以这种现有方法为基础。首先,我们通过开发定制的残留神经网络来重新审视此方法,该神经网络是针对对撞机数据的结构和对称性量身定制的。其次,我们基于最佳运输问题开发了一种新方法来进行背景估计,该方法依赖于与早期工作不同的建模假设。由于其基本假设的互补性,这两种方法可以在粒子物理分析中互相检查。我们比较了它们在模拟双Higgs玻色子数据上的性能。
We study the problem of data-driven background estimation, arising in the search of physics signals predicted by the Standard Model at the Large Hadron Collider. Our work is motivated by the search for the production of pairs of Higgs bosons decaying into four bottom quarks. A number of other physical processes, known as background, also share the same final state. The data arising in this problem is therefore a mixture of unlabeled background and signal events, and the primary aim of the analysis is to determine whether the proportion of unlabeled signal events is nonzero. A challenging but necessary first step is to estimate the distribution of background events. Past work in this area has determined regions of the space of collider events where signal is unlikely to appear, and where the background distribution is therefore identifiable. The background distribution can be estimated in these regions, and extrapolated into the region of primary interest using transfer learning with a multivariate classifier. We build upon this existing approach in two ways. First, we revisit this method by developing a customized residual neural network which is tailored to the structure and symmetries of collider data. Second, we develop a new method for background estimation, based on the optimal transport problem, which relies on modeling assumptions distinct from earlier work. These two methods can serve as cross-checks for each other in particle physics analyses, due to the complementarity of their underlying assumptions. We compare their performance on simulated double Higgs boson data.