Event Selection

14.3 Event Selection

The primary physics objects considered in this analysis are large-radius, AK8 jets representing the two Higgs bosons. AK4 jets are also used in the online triggers and to identify nonresonant VBF $H H$ production. As we do not expect any isolated leptons in our signal, events containing any isolated electrons and muons are vetoed. The online trigger selections are described in Section 14.3.1, and the offline selections for the nonresonant and resonant searches in Sections 14.3.2 and 14.3.3, respectively.

14.3.1 Triggers

No dedicated online trigger algorithms were available in Run 2 for boosted Higgs classification. Instead, a combination of high level triggers (HLTs) is considered, which require high hadronic activity and/or AK8 jets with high transverse momentum, as well as jet mass and/or b-tagging requirements. The efficiencies of these triggers as a function of AK8 jet $p_{T}$ , soft-drop mass [394], and $b \bar{b}$ -tagging score are measured in data in an unbiased semi-leptonic $t \bar{t}$ region, defined using single muon triggers and offline selections on the muon and an AK8 jet. This measurement is shown in Figure 14.2 for the 2018 dataset. The triggers are generally fully efficient for jet $p_{T} > 500 GeV$ , while for $p_{T} < 400 GeV$ the efficiency is $≲ 10 %$ . This is a significant limitation of the analysis and generally of boosted Higgs searches in Run 2, which is addressed in Run 3 by the introduction of dedicated triggers for boosted Higgs searches [397].

Figure 14.2. Trigger efficiencies for the 2018 dataset measured in bins of the AK8 jet $p_{T}$ , soft drop mass (MassSD) and $T_{Xbb}$ score.

14.3.2 Nonresonant offline selection

In the nonresonant analysis, both the $H \to b \bar{b}$ and $H \to V V$ decays are targeted through an offline selection for two highly boosted AK8 jets with a minimum $p_{T}$ of $300 GeV$ and $| η | < 2.4$ . ParticleNet is used to isolate the signal $H \to b \bar{b}$ jets against background QCD jets, using the $T_{Xbb}$ discriminant derived from its outputs (Eq. 13.1.1), while our new GloParT model is leveraged to identify the $H \to V V \to 4 q$ jet. Both networks have been decorrelated from the mass of the jets by enforcing a uniform distribution in jet mass and $p_{T}$ in the training samples [164], to aid with their calibration. Additionally, as the jet mass resolution is crucial to the sensitivity of the search, we optimize the mass reconstruction for all AK8 jets using the ParticleNet-based regression algorithm, the output of which we refer to as $m_{reg}$ . The jet with the higher (lower) $T_{Xbb}$ score is considered the $b \bar{b}$ - ( $V V$ -) candidate jet.

The VBF process produces two, likely forward, jets with large invariant masses and pseudorapidity separations. To identify this mode, we select up to two AK4 jets per event, required to have $p_{T} > 25 GeV$ , $| η | < 4.7$ , and a $Δ R$ separation of 1.2 and 0.8, respectively, from the $b \bar{b}$ - and $V V$ -candidate AK8 jets. The pseudorapidity separation between and invariant mass of the two highest $p_{T}$ jets passing these requirements are used as input variables in a boosted decision tree (BDT) to discriminate against QCD and other backgrounds. Other input variables include outputs from the GloParT tagger and the two selected AK8 jet kinematics. The variables are optimized to provide the highest BDT performance while remaining decorrelated from the $b \bar{b}$ -candidate jet’s mass.

The BDT is optimized simultaneously for both the SM ggF and BSM VBF $κ_{2 V} = 0$ signals, and separate “ggF” and “VBF” signal regions are defined using the BDT probabilities for the respective processes, referred to as ${BDT}_{ggF}$ and ${BDT}_{VBF}$ . Concretely, the VBF region is defined by selections on the $T_{Xbb}$ and ${BDT}_{VBF}$ discriminants, corresponding to VBF signal (background) efficiencies of 40% ( $\approx 0.1 %$ ) and 20% ( $\approx 0.003 %$ ), respectively, chosen to optimize the expected exclusion limit on the VBF signal. The ggF region is defined by a veto on events passing the VBF selections plus selections on the $T_{Xbb}$ and ${BDT}_{ggF}$ discriminants, corresponding to ggF signal (background) efficiencies of 60% ( $\approx 0.3 %$ ) and 7% ( $\approx 0.01 %$ ), respectively, similarly chosen to optimize the limit on the ggF signal. These selections are henceforth referred to as the ggF and VBF $T_{Xbb}$ and BDT working points (WPs). The $T_{Xbb}$ discriminant’s signal efficiencies are calibrated using boosted gluon splitting to bottom quark ( $g \to b \bar{b}$ ) jets in data and simulations [164], with $p_{T}$ -dependent scale factors and uncertainties applied to the $H H$ signals. The uncertainty on the BDT signal efficiency is dominated by that of the GloParT tagger and is calibrated based on a new technique using the ratio of the primary Lund jet plane [61] densities of each individual quark-subjet, described below in Section 13.3.

The search is performed by constructing a likelihood in the pass region as a function of the $H \to b \bar{b}$ -candidate jet’s regressed mass ( $m_{reg}^{bb}$ ). The QCD multijet background contribution in the pass region is estimated through data in a “fail” region, defined using the same baseline selections on the two AK8 jets, but with the $T_{Xbb}$ selection inverted, as described in Section 14.4 below. A summary of all offline selections is provided in Table 14.1, and the signal and fail region selections in terms of the $T_{Xbb}^{bb}$ and BDT scores are illustrated in Figure 14.3.

-----------------------------|----------------------------|-----------------------
VBF Region | ggF Region | Fail Region
----------------------------------------------------------------------------------

No electrons or muons

≥ 2 AK8 jets

pT > 300GeV (all jets)

|η| < 2.4 (all jets)

50 < mreg < 250GeV (all jets)

TXbb > 0.8 (at least one jet)

Jet assignment:

H → b ¯b: highest TXbb score

H → VV: out of remaining jets, highest GloParT score
-----------------------------|----------------------------|-----------------------
| |
|Not passing VBF selections |
bb | bb | bb
T Xbb ≥ VBF TXbb WP | TXbb ≥ ggF TXbb WP |T Xbb < ggF TXbb WP
| |
-BDTVBF---≥--VBF--BDT---WP-----BDTggF--≥--ggF-BDT---WP---------------------------- — Table 14.1. Offline selection criteria for the signal and fail nonresonant analysis regions.

Figure 14.3. Illustration of the signal and fail nonresonant analysis region selections in terms of the $T_{Xbb}^{bb}$ and BDT scores.

14.3.3 Resonant offline selection

The resonant analysis similarly selects for two wide-radius jets representing the two $H \to b \bar{b}$ and $Y \to V V$ processes. Specifically, we select for two boosted AK8 jets with $p_{T} \geq 350 GeV$ , with at least one of $p_{T} \geq 400 GeV$ , and pseudorapidity $| η | \leq 2.4$ . Out of all AK8 jets in the event passing these requirements, the one with the highest $T_{Xbb}$ discriminant score is considered our $H \to b \bar{b}$ candidate jet, and is required to pass the high purity WP and have a jet mass close to the SM Higgs mass: $110 \leq mass < 145 GeV$ . As in the nonresonant case, the jet mass resolution is crucial to the sensitivity of the search and hence we use the ParticleNet-based regression algorithm to reconstruct the jet mass, $m_{reg}$ , here as well.

The mass-decorrelated GloParT tagger is again used to identify the $Y \to V V \to 4 q$ jet, using the discriminant $T_{HVV}$ targeting the $V V \to 4 q$ final state derived from its outputs (Eq. 13.2.1). The AK8 jet passing the above $p_{T}$ and $η$ kinematic selections with the highest $T_{HVV}$ score is considered the $Y \to V V$ candidate jet,¹ and is required to have a $T_{HVV}$ score $> 0.6$ , corresponding to a $\approx 60 %$ ( $\approx 1 %$ ) signal (background) efficiency. The signal efficiency is calibrated based the Lund jet plane as described in Chapter 13.3. All the $p_{T}$ and tagger selections were jointly optimized for the lowest expected exclusion limits for a range of $m_{X}, m_{Y}$ points.

The search is performed in events passing these selections, referred to as the signal or “pass” region, in the 2D plane of the $V V$ -candidate jet regressed mass ( $m_{reg}^{VV}$ ) and the invariant mass of the $b \bar{b}$ - and $V V$ -candidate jets ( $m^{jj}$ ), representing the potential Y and X boson masses, respectively. An orthogonal control, or “fail”, region is defined by inverting the two tagger selections for both jets to estimate the QCD background in the pass region, as detailed in Section 14.4. Finally, separate “validation” pass and fail regions using the $H \to b \bar{b}$ candidate jet’s mass sidebands are used to validate the background estimation technique before unblinding the analysis. A summary of the offline selections is provided in Table 14.2.

-----------------------------------|--------------------------------------------------
|
Signal Region Validation Region
--------------------------------------------------------------------------------------
≥ 2 AK8 jets

pT > 350GeV (all jets)

|η | < 2.4 (all jets)

pT > 400GeV (jet leading in pT)

Jet assignment:

H → bb¯: highest TXbb score

Y → VV: out of remaining jets, highest THVV score
-----------------------------------|--------------------------------------------------
110 ≤ mbb < 145GeV |92.5 ≤ mbb < 110GeV or 145 ≤ mbb < 162.5GeV
---------------reg----------------------------reg-----------------------reg--------------
| | |
Pass | Fail | Pass | Fail
-----------------|-----------------|----------------|---------------------------------
T ≥ HP WP |T < HP WP |T ≥ HP WP | T < HP WP
Xbb | Xbb | Xbb | Xbb
T ≥ 0.6 | T < 0.6 | T ≥ 0.6 | T < 0.6
----HVV---------------HVV--------------HVV-----------------------HVV------------------ — Table 14.2. Offline selection criteria for analysis regions for the fully-merged Y topology.

¹In the rare ( $< 0.1 %$ of signal events) case where the same jet has the highest $T_{Xbb}$ and $T_{HVV}$ score, that jet is considered the $H \to b \bar{b}$ candidate, and the second-highest $T_{HVV}$ scoring jet is the $Y \to V V$ candidate.