14.3 Event Selection
The primary physics objects considered in this analysis are large-radius, AK8 jets representing the two Higgs bosons. AK4 jets are also used in the online triggers and to identify nonresonant VBF production. As we do not expect any isolated leptons in our signal, events containing any isolated electrons and muons are vetoed. The online trigger selections are described in Section 14.3.1, and the offline selections for the nonresonant and resonant searches in Sections 14.3.2 and 14.3.3, respectively.
14.3.1 Triggers
No dedicated online trigger algorithms were available in Run 2 for boosted Higgs classification. Instead, a combination of high level triggers (HLTs) is considered, which require high hadronic activity and/or AK8 jets with high transverse momentum, as well as jet mass and/or b-tagging requirements. The efficiencies of these triggers as a function of AK8 jet , soft-drop mass [394], and -tagging score are measured in data in an unbiased semi-leptonic region, defined using single muon triggers and offline selections on the muon and an AK8 jet. This measurement is shown in Figure 14.2 for the 2018 dataset. The triggers are generally fully efficient for jet , while for the efficiency is . This is a significant limitation of the analysis and generally of boosted Higgs searches in Run 2, which is addressed in Run 3 by the introduction of dedicated triggers for boosted Higgs searches [397].
14.3.2 Nonresonant offline selection
In the nonresonant analysis, both the and decays are targeted through an offline selection for two highly boosted AK8 jets with a minimum of and . ParticleNet is used to isolate the signal jets against background QCD jets, using the discriminant derived from its outputs (Eq. 13.1.1), while our new GloParT model is leveraged to identify the jet. Both networks have been decorrelated from the mass of the jets by enforcing a uniform distribution in jet mass and in the training samples [164], to aid with their calibration. Additionally, as the jet mass resolution is crucial to the sensitivity of the search, we optimize the mass reconstruction for all AK8 jets using the ParticleNet-based regression algorithm, the output of which we refer to as . The jet with the higher (lower) score is considered the - (-) candidate jet.
The VBF process produces two, likely forward, jets with large invariant masses and pseudorapidity separations. To identify this mode, we select up to two AK4 jets per event, required to have , , and a separation of 1.2 and 0.8, respectively, from the - and -candidate AK8 jets. The pseudorapidity separation between and invariant mass of the two highest jets passing these requirements are used as input variables in a boosted decision tree (BDT) to discriminate against QCD and other backgrounds. Other input variables include outputs from the GloParT tagger and the two selected AK8 jet kinematics. The variables are optimized to provide the highest BDT performance while remaining decorrelated from the -candidate jet’s mass.
The BDT is optimized simultaneously for both the SM ggF and BSM VBF signals, and separate “ggF” and “VBF” signal regions are defined using the BDT probabilities for the respective processes, referred to as and . Concretely, the VBF region is defined by selections on the and discriminants, corresponding to VBF signal (background) efficiencies of 40% () and 20% (), respectively, chosen to optimize the expected exclusion limit on the VBF signal. The ggF region is defined by a veto on events passing the VBF selections plus selections on the and discriminants, corresponding to ggF signal (background) efficiencies of 60% () and 7% (), respectively, similarly chosen to optimize the limit on the ggF signal. These selections are henceforth referred to as the ggF and VBF and BDT working points (WPs). The discriminant’s signal efficiencies are calibrated using boosted gluon splitting to bottom quark () jets in data and simulations [164], with -dependent scale factors and uncertainties applied to the signals. The uncertainty on the BDT signal efficiency is dominated by that of the GloParT tagger and is calibrated based on a new technique using the ratio of the primary Lund jet plane [61] densities of each individual quark-subjet, described below in Section 13.3.
The search is performed by constructing a likelihood in the pass region as a function of the -candidate jet’s regressed mass (). The QCD multijet background contribution in the pass region is estimated through data in a “fail” region, defined using the same baseline selections on the two AK8 jets, but with the selection inverted, as described in Section 14.4 below. A summary of all offline selections is provided in Table 14.1, and the signal and fail region selections in terms of the and BDT scores are illustrated in Figure 14.3.
14.3.3 Resonant offline selection
The resonant analysis similarly selects for two wide-radius jets representing the two and processes. Specifically, we select for two boosted AK8 jets with , with at least one of , and pseudorapidity . Out of all AK8 jets in the event passing these requirements, the one with the highest discriminant score is considered our candidate jet, and is required to pass the high purity WP and have a jet mass close to the SM Higgs mass: . As in the nonresonant case, the jet mass resolution is crucial to the sensitivity of the search and hence we use the ParticleNet-based regression algorithm to reconstruct the jet mass, , here as well.
The mass-decorrelated GloParT tagger is again used to identify the jet, using the discriminant targeting the final state derived from its outputs (Eq. 13.2.1). The AK8 jet passing the above and kinematic selections with the highest score is considered the candidate jet,1 and is required to have a score , corresponding to a () signal (background) efficiency. The signal efficiency is calibrated based the Lund jet plane as described in Chapter 13.3. All the and tagger selections were jointly optimized for the lowest expected exclusion limits for a range of points.
The search is performed in events passing these selections, referred to as the signal or “pass” region, in the 2D plane of the -candidate jet regressed mass () and the invariant mass of the - and -candidate jets (), representing the potential Y and X boson masses, respectively. An orthogonal control, or “fail”, region is defined by inverting the two tagger selections for both jets to estimate the QCD background in the pass region, as detailed in Section 14.4. Finally, separate “validation” pass and fail regions using the candidate jet’s mass sidebands are used to validate the background estimation technique before unblinding the analysis. A summary of the offline selections is provided in Table 14.2.
1In the rare ( of signal events) case where the same jet has the highest and score, that jet is considered the candidate, and the second-highest scoring jet is the candidate.