Asymptotic formulae

8.3 Asymptotic formulae

8.3.1 Asymptotic form of the MLE

So far, we have discussed how to extract meaningful statistical results from HEP experiments by making extensive use of pseudodata / toy experiments to estimate the sampling distributions of profile-likelihood-ratio-based test statistics. While this worked nicely for our simple counting experiment, generating a sufficiently large number of toys can quickly become computationally intractable for the more complex searches (and statistical combinations of searches) that are increasingly prevalent at the LHC, containing at times up to thousands of bins and nuisance parameters. This and the following section discuss a way to approximate these sampling distributions without the need for pseudodata. This was introduced in the famous “CCGV” paper [301] in 2011 and has since become the de-facto procedure at the LHC.

As hinted at previously, such as in Figures 8.6 and 8.12, the distributions $p ({\tilde{t}}_{μ} | μ^{'})$ and $p ({\tilde{q}}_{μ} | μ^{'})$ (where, in general, $μ^{'} \neq μ$ ) have similar forms regardless of the nuisance parameters (or sometimes even the POIs). This is not a coincidence: we will now derive their “asymptotic” — i.e., in the large sample limit — forms, starting first with the asymptotic form of the maximum likelihood estimator (MLE).

It is important to remember that the MLE $\hat{μ}$ of $μ$ is a random variable with its own probability distribution. We can estimate it as always by sampling toys, shown in Figure 8.20 for our counting experiment (Eq. 8.2.3). One can observe that $p (\hat{μ})$ follows a Gaussian distribution as the number of events $N$ increases, and indeed this becomes clear if we try to fit one to the histograms (Figure 8.21). We will now show this to be true generally, deriving the analytic distribution in Sections 8.3.1.—8.3.1., and discussing the results and the important concept of the Asimov dataset for numerical estimation in Sections 8.3.1. and 8.3.1., respectively.

Figure 8.20. Distribution of the MLE of $μ$ for different $s$ and $b$ produced using 30,000 toy experiments each. (Note the x-axis range is becoming narrower from the left-most to the right-most plot.)

Figure 8.21. Gaussian fits to distributions of $\hat{μ}$ for different $s$ and $b$ from Figure 8.20.

Statistics background

We first provide a lightning review of some necessary statistics concepts and results.

Definition 8.3.1. Let the negative log-likelihood (NLL) $- \ln L (μ) \equiv - l (μ)$ . The derivative of the NLL $- l^{'} (μ)$ is called the score $s (μ)$ . It has a number of useful properties: ⁵

1.: Its expectation value at $μ^{'}$ $𝔼_{μ = μ^{'}} [s (μ^{'})] = 0$ .
2.: Its variance $Var [s (μ)] = - 𝔼 [l^{″} (μ)]$ .

Note that the expectation value here means an average over observations which are distributed according to a particular $μ$ , which here we’re calling the “true” $μ$ : $μ^{'}$ .

Definition 8.3.2. $- 𝔼 [l^{″} (μ)] \equiv I (μ)$ is called the Fisher information. It quantifies the information our data contains about $μ$ and importantly, as we’ll see, it (approximately) represents the inverse of the variance of $\hat{μ}$ . More generally, for multiple parameters,

I_{ij} (μ) = - 𝔼 [\frac{\partial^{2} l}{\partial μ_{i} \partial μ_{j}}]

(8.3.1)

is the Fisher information matrix. It is also commonly called the covariance matrix.

Theorem 8.3.1. Putting this together, by the central limit theorem [306], this means $p (s (μ^{'}))$ follows a normal distribution with mean $0$ and variance $I (μ^{'})$ , up to terms of order $O (\frac{1}{\sqrt{N}})$ :

s (μ^{'}) \to_{}^{\sqrt{N} > > 1} N (0, \sqrt{I (μ^{'})}),

(8.3.2)

where $N$ represents the data sample size.

The Fisher information

For our simple counting experiment, the Fisher information matrix $I (μ, b)$ can be found by taking second derivatives of the NLL (Eq. 8.2.5). The $I_{μμ}$ term, for example, is:

I_{μμ} (μ, b) = - 𝔼 [\partial^{μ} \partial^{μ} l (μ, b)] = 𝔼 [n \cdot \frac{s^{2}}{{(μs + b)}^{2}}] = 𝔼 [n] \cdot \frac{s^{2}}{{(μs + b)}^{2}} = \frac{(μ^{'} s + b^{'}) s^{2}}{{(μs + b)}^{2}} .

(8.3.3)

In the last step we use the fact that $𝔼 [n]$ under true $μ = μ^{'}$ , $b = b^{'}$ , is $μ^{'} s + b^{'}$ . For the remainder of this section, $I (μ, b)$ will always be evaluated at the true values of the parameters,⁶ so this can be simplified to $I_{μμ} (μ^{'}, b^{'}) = \frac{s^{2}}{μ^{'} s + b^{'}}$ . This is plotted in Figure 8.22, where we can the Fisher information captures the fact that as $b$ increases, we lose sensitivity to — or information about — $μ$ .

Figure 8.22. The Fisher information $I_{μμ} (μ, b)$ for different $μ$ and $s$ , as a function of the expected background $b$ .

For completeness (and since we’ll need it below), the full Fisher information matrix for our problem, repeating the steps in Eq. 8.3.3, is:

I (μ^{'}, b^{'}) = (\begin{matrix} I_{μμ} & I_{μb} \\ I_{bμ} & I_{bb} \end{matrix}) (μ^{'}, b^{'}) = (\begin{matrix} \frac{s^{2}}{μ^{'} s + b^{'}} & \frac{s}{μ^{'} s + b^{'}} \\ \frac{s}{μ^{'} s + b^{'}} & \frac{1}{μ^{'} s + b^{'}} + \frac{1}{b^{'}} \end{matrix})

(8.3.4)

Derivation

We now have enough background to derive the asymptotic form of the MLE. We do this for the 1D case by Taylor-expanding the score of $\hat{μ}$ , $l^{'} (\hat{μ})$ - which we know to be $= 0$ - around $μ^{'}$ :

\begin{align} l^{'} (\hat{μ}) & = l^{'} (μ^{'}) + l^{″} (μ^{'}) (\hat{μ} - μ^{'}) + O ({(\hat{μ} - μ^{'})}^{2}) = 0 & (8.3.5) \\ \Rightarrow \hat{μ} - μ^{'} & ≃ - \frac{l^{'} (μ^{'})}{l^{″} (μ^{'})} \to_{}^{\sqrt{N} > > 1} \frac{1}{I (μ^{'})} N (0, \sqrt{I (μ^{'})}) = N (0, \frac{1}{\sqrt{I (μ^{'})}}), & (8.3.6) \end{align}

where we plugged in the distribution of $l^{'} (μ^{'})$ from Eq. 8.3.2, claimed $l^{″} (μ^{'})$ asymptotically equals its expectation value $𝔼 [l^{″} (μ^{'})] = I (μ^{'})$ by the law of large numbers [307], and are ignoring the $O ({(\hat{μ} - μ^{'})}^{2})$ term.⁷

For multiple parameters, $I$ is a matrix so the variance generalized to the matrix inverse:

\hat{μ} - μ^{'} ≃ N (0, \sqrt{I_{μμ}^{- 1} (μ^{'}, b^{'})}),

(8.3.7)

Result

Thus, we see that $\hat{μ}$ asymptotically follows a normal distribution around the true $μ$ value, $μ^{'}$ , with a variance $σ_{\hat{μ}}^{2} = I_{μμ}^{- 1} (μ^{'}, b^{'})$ , up to $O (1 ∕ \sqrt{N})$ terms. Intuitively, from the definition of the Fisher information $I$ , we can interpret this as saying that the more information we have about $μ$ from the data, the lower the variance should be on $\hat{μ}$ .

Continuing with our counting experiment from Section 8.2.1., inverting $I$ from Eq. 8.3.4 gives us

σ_{\hat{μ}} = \sqrt{I_{μμ}^{- 1} (μ^{'}, b^{'})} = \frac{\sqrt{μ^{'} s + 2 b^{'}}}{s} .

(8.3.8)

Note that, as we might expect, this scales as $\sim \sqrt{b}$ , which is the uncertainty of our Poisson nuisance parameter $b$ — showing mathematically why we want to keep uncertainties on nuisance parameters as low as possible. This is compared to the toy-based distributions from Section 8.3.1 in Figure 8.23 this time varying the true signal strength $μ^{'}$ as well, where we can observe that this matches very well for large $s, b$ , while for small values there are some discrete differences.

Figure 8.23. Asymptotic (dotted lines) and toy-based (solid lines) distributions, using 30,000 toys each, of the MLE of $μ$ for different $s$ , $b$ , and true signal strengths $μ^{'}$ .

We can also check the total per-bin errors between the asymptotic form and the toy-based distributions directly, as shown in Figure 8.24 (for $μ^{'} = 1$ only). Indeed, this confirms that the error scales as $\sim \frac{1}{\sqrt{s}}$ and $\sim \frac{1}{\sqrt{b}}$ , as claimed above.

Figure 8.24. Error between the sampled toy distributions, using 50,000 toys each, and the asymptotic distributions of the MLE of $μ$ for different $s$ and $b$ (blue), with $1 ∕ \sqrt{N}$ fits in red.

Numerical estimation and the Asimov dataset

In this section, because of the simplicity of our data model, we were able to derive the Fisher information $I$ and, hence, the asymptotic form of $\hat{μ}$ analytically. In general, this is not possible and we typically have to minimize $l$ , find its second derivatives, and solve Eq. 8.3.3 etc. numerically instead.

However, when calculating the Fisher information, how do we deal with the expectation value over the observed data ( $n, m$ in our case)? Naively, this would require averaging over a bunch of generated toy $n, m$ values again, which defeats the purpose of using the asymptotic form of $\hat{μ}$ !

Instead, we can switch the order of operations in Eq. 8.3.3,⁸ rewriting it as:

I_{ij} (μ, b) = - 𝔼 [\partial^{i} \partial^{j} l (μ, b; n, m)] = - \partial^{i} \partial^{j} 𝔼 [l (μ, b; n, m)] = - \partial^{i} \partial^{j} l (μ, b; 𝔼 [n], 𝔼 [m]) .

(8.3.9)

Importantly, this says we can find $I$ by simply evaluating the likelihood for a dataset of observations equal to their expectation values under $μ^{'}$ instead of averaging over the distribution of observations and then getting its second derivatives.

Definition 8.3.3. Such a dataset is called the Asimov dataset, and $L (μ; 𝔼 [n], 𝔼 [m]) \equiv L_{A}$ is referred to as the “Asimov likelihood”.⁹

8.3.2 Asymptotic form of the profile likelihood ratio

We can now proceed to derive the asymptotic form of the sampling distribution $p (t_{μ} | μ^{'})$ of the profile likelihood ratio test statistic $t_{μ}$ , under a “true” signal strength of $μ^{'}$ . This asymptotic form is extremely useful for simplifying the computation of (expected) significances, limits, and intervals; indeed, standard procedure at the LHC is to use it in lieu of toy-based, empirical distributions for $p (t_{μ} | μ^{'})$ .

Asymptotic form of the profile likelihood ratio

We start with deriving the asymptotic form of the profile likelihood ratio test statistic $t_{μ}$ (Eq. 8.2.7) by following a similar procedure to Section 8.3.1. — and using the results therein — of Taylor expanding around its minimum at $\hat{μ}$ :¹⁰

\begin{align} t_{μ} & = - 2 \ln λ (μ) & (8.3.10) \\ = - 2 l (μ, \hat{\hat{b}} (μ)) + 2 l (\hat{μ}, \hat{b}) & (8.3.11) \\ ≃ \underset{\hat{\hat{b}} (\hat{μ}) = \hat{b} so this is 0}{\underset{⏟}{- 2 l (\hat{μ}, \hat{\hat{b}} (\hat{μ})) + 2 l (\hat{μ}, \hat{b})}} - \underset{l^{'} (\hat{μ}, \hat{b}) = 0}{\underset{⏟}{2 l^{'} (\hat{μ}, \hat{\hat{b}} (\hat{μ})) (μ - \hat{μ})}} - 2 l^{″} (\hat{μ}, \hat{\hat{b}} (\hat{μ})) \cdot \frac{{(μ - \hat{μ})}^{2}}{2} & (8.3.12) \\ = - l^{″} (\hat{μ}, \hat{b}) \cdot {(μ - \hat{μ})}^{2} & (8.3.13) \\ = \underset{By law of large numbers}{\underset{⏟}{- 𝔼 [l^{″} (\hat{μ}, \hat{b})]}} \cdot {(μ - \hat{μ})}^{2} & (8.3.14) \\ = \underset{Since bias of MLEs \sim 0}{\underset{⏟}{- 𝔼 [l^{″} (μ^{'}, b^{'})]}} \cdot {(μ - \hat{μ})}^{2} & (8.3.15) \\ = \underset{From definition of Fisher information}{\underset{⏟}{I_{μμ} (μ^{'}, b^{'})}} \cdot {(μ - \hat{μ})}^{2} & (8.3.16) \\ \Rightarrow & \underset{Using σ_{\hat{μ}} ≃ \sqrt{I_{μμ}^{- 1} (μ^{'}, b^{'})}}{\underset{⏟}{t_{μ} ≃ \frac{{(μ - \hat{μ})}^{2}}{σ_{\hat{μ}}^{2}}}} + O ({(μ - \hat{μ})}^{3}) + O (\frac{1}{\sqrt{N}}) . & (8.3.17) \end{align}

Here, just like in Eq. 8.3.6, we use the law of large numbers in Line 8.3.14 and take $l^{″} (\hat{μ}, \hat{b})$ to asymptotically equal its expectation value under the true parameter values $μ^{'}, b^{'}$ : $l^{″} (\hat{μ}, \hat{b}) \to_{}^{\sqrt{N} > > 1} 𝔼 [l^{″} (\hat{μ}, \hat{b})]$ . We then in Line 8.3.15 also use the fact that MLEs are generally unbiased estimators of the true parameter values in the large sample limit to say $𝔼 [l^{″} (\hat{μ}, \hat{b})] \to_{}^{\sqrt{N} > > 1} 𝔼 [l^{″} (μ^{'}, b^{'})]$ . Finally, in the last step, we use the asymptotic form of the MLE (Eq. 8.3.7).

Asymptotic form of $p (t_{μ} | μ^{'})$

Now that we have an expression for $t_{μ}$ , we can consider its sampling distribution. With a simple change of variables, the form of $p (t_{μ} | μ^{'})$ should hopefully be evident: recognizing that $μ$ and $σ_{\hat{μ}}^{2}$ are simply constants, while $\hat{μ}$ we know is distributed as a Gaussian centered around $μ^{'}$ with variance $σ_{\hat{μ}}^{2}$ , let’s define $γ \equiv \frac{μ - \hat{μ}}{σ_{\hat{μ}}}$ , so that

\begin{align} t_{μ} & ≃ \frac{{(μ - \hat{μ})}^{2}}{σ_{\hat{μ}}^{2}} = γ^{2}, & (8.3.18) \\ γ & \sim N (\frac{μ - μ^{'}}{σ_{\hat{μ}}}, 1) . & (8.3.19) \end{align}

For the special case of $μ = μ^{'}$ , we can see that $t_{μ}$ is simply the square of a standard normal random variable, which is the definition of the well-known $χ_{k}^{2}$ distribution with $k = 1$ degrees of freedom (DoF):

p (t_{μ} | μ) \sim χ_{1}^{2} .

(8.3.20)

In the general case where $μ$ may not $= μ^{'}$ , $t_{μ}$ is the square of random variable with unit variance but non-zero mean. This is distributed as the similar, but perhaps less well-known, non-central chi-squared $χ_{k}^{' 2} (Λ)$ , again with 1 DoF, and with a “non-centrality parameter”

\begin{align} Λ = {\bar{γ}}^{2} & = (\frac{μ - μ^{'}}{σ_{\hat{μ}}})^{2}, & (8.3.21) \\ p (t_{μ} | μ^{'}) & \sim χ_{1}^{' 2} (Λ) . & (8.3.22) \end{align}

The “central” vs. non-central chi-squared distributions are visualized in Figure 8.25 for $k = 1$ . We can see that $χ_{k}^{' 2} (Λ)$ simply shifts towards the right as $Λ$ increases (at $Λ = 0$ it is a regular central $χ^{2}$ ). As $Λ \to \infty$ , $χ_{k}^{' 2} (Λ)$ becomes more and more like a normal distribution with mean $Λ$ .¹¹

Figure 8.25. Central $χ_{k}^{2}$ and non-central $χ_{k}^{' 2} (Λ)$ distributions for $Λ$ between $1 - 30$ (left) and $30 - 300$ (right).

By extending the derivation in Eq. 8.3.17 to multiple POIs, one can find the simple generalization to multiple POIs $μ$ :

p (t_{μ} | μ^{'}) \sim χ_{k}^{' 2} (Λ),

(8.3.23)

where the DoF $k$ are equal the number of POIs $\dim μ$ , and

Λ = {(μ - μ^{'})}^{T} \cdot {\tilde{I}}^{- 1} (μ^{'}) \cdot (μ - μ^{'}),

(8.3.24)

where ${\tilde{I}}^{- 1}$ is $I^{- 1}$ restricted only to the components corresponding to the POIs.

Estimating $σ_{\hat{μ}}^{2}$

The critical remaining step to understanding the asymptotic distribution of $t_{μ}$ is estimating $σ_{\hat{μ}}^{2}$ to find the non-centrality parameter $Λ$ in Eq. 8.3.21. We now discuss two methods to do this.

Method 1: Inverting the Fisher information / covariance matrix The first method is simply using $σ_{\hat{μ}} ≃ \sqrt{I_{μμ}^{- 1} (μ^{'}, b^{'})}$ as in Section 8.3.1.¹² This is shown in Figure 8.26 for our counting experiment, using the analytic form for $σ_{\hat{μ}}$ from Eq. 8.3.8. We can see that this asymptotic approximation agrees well with the true distribution for some range of parameters, but can deviate significantly for others, as highlighted especially in the right plot.

Figure 8.26. Comparing the distribution $p (t_{μ} | μ^{'})$ (solid) with non-central $χ_{1}^{' 2} (Λ)$ distributions (dotted) for a range of $s, b, μ, μ^{'}$ values, with $σ_{\hat{μ}}^{2}$ estimated using the inverse of the Fisher information matrix.

Interlude on Asimov dataset While we are able to find the analytic form for $\sqrt{I_{μμ}^{- 1} (μ^{'}, b^{'})}$ easily for our simple counting experiment, in general it has to be calculated numerically. As introduced in Section 8.3.1., to handle the expectation value under $μ^{'}, b^{'}$ in Eq. 8.3.1, we can make use of the Asimov dataset, where the observations $n_{A}$ , $m_{A}$ are taken to be their expectation values under $μ^{'}, b^{'}$ , simplifying the calculation of $I$ to Eq. 8.3.9.

Explicitly, for our counting experiment (Eq. 8.2.3), the Asimov observations are simply

\begin{align} n_{A} & = 𝔼 [n] = μ^{'} s + b^{'}, & (8.3.25) \\ m_{A} & = 𝔼 [m] = b^{'} . & (8.3.26) \end{align}

We’ll now consider a second powerful use of the Asimov dataset to estimate $σ_{\hat{μ}}^{2}$ .

Method 2: The “Asimov sigma” estimate Putting together Eqs. 8.2.10 and 8.3.26, we can derive a nice property of the Asimov dataset: the MLEs $\hat{μ}$ , $\hat{b}$ equal the true values $μ^{'}$ , $b^{'}$ :

\begin{align} \hat{b} & = m_{A} = b^{'} & (8.3.27) \\ \hat{μ} & = \frac{n_{A} - m_{A}}{s} = \frac{μ^{'} s + b^{'} - b^{'}}{s} = μ^{'} . & (8.3.28) \end{align}

Thus, $t_{μ}$ evaluated for the Asimov dataset is exactly the non-centrality parameter $Λ$ that we are after!

t_{μ, A} ≃ (\frac{μ - \hat{μ}}{σ_{\hat{μ}}})^{2} = (\frac{μ - μ^{'}}{σ_{\hat{μ}}})^{2} = Λ .

(8.3.29)

While, not strictly necessary to obtain the asymptotic form for $p (t_{μ} | μ^{'})$ , we can also invert this to estimate $σ_{\hat{μ}}$ , as

σ_{A} ≃ \frac{{(μ - μ^{'})}^{2}}{t_{μ, A}},

(8.3.30)

where $σ_{A}$ is known as the “Asimov sigma”.

The asymptotic distributions using $Λ = t_{μ, A}$ are plotted in Figure 8.27. We see that this estimate matches the sampling distributions very well, even for cases where the covariance-matrix-estimate failed! Indeed, this is why estimating $σ_{\hat{μ}} ≃ σ_{A}$ is the standard in LHC analyses, and that is the method we’ll employ going forward.

Reference [301] conjectures that this is because the Fisher-information-approach is restricted only to estimating the second-order term of Eq. 8.3.17, while with $t_{μ, A}$ we’re matching the shape of the likelihood at the minimum which may be able capture some of the higher order terms as well.

Figure 8.27. Comparing the sampling distribution $p (t_{μ} | μ^{'})$ with non-central $χ_{1}^{' 2} (Λ)$ distributions for a range of $s, b, μ, μ^{'}$ values, with the Asimov sigma estimation for $σ_{\hat{μ}}^{2}$ .

Despite the pervasive use of the asymptotic formula at the LHC, it’s important to remember that it’s an approximation, only valid for large statistics. Figure 8.28 shows it breaking down for $s, b ≲ 10$ below.

Figure 8.28. Comparing the sampling distribution $p (t_{μ} | μ^{'})$ with non-central $χ_{1}^{' 2} (Λ)$ distributions for different $s, b \leq 10$ , showing the break-down of the $σ_{A}$ approximation for $σ_{\hat{μ}}^{2}$ at low statistics.

The PDF and CDF

The probability distribution function (PDF) for a $χ_{k}^{' 2} (Λ)$ distribution can be found in e.g. Ref. [309] for $k = 1$ :

p (t_{μ} | μ^{'}) ≃ χ_{1}^{' 2} (Λ) = \frac{1}{2 \sqrt{t_{μ}}} (φ (\sqrt{t_{μ}} - \sqrt{Λ}) + φ (\sqrt{t_{μ}} + \sqrt{Λ})),

(8.3.31)

where $φ$ is the PDF of a standard normal distribution. For $μ = μ^{'} \Rightarrow Λ = 0$ , this simplifies to:

p (t_{μ} | μ) ≃ χ^{2} = \frac{1}{\sqrt{t_{μ}}} φ (\sqrt{t_{μ}}) .

(8.3.32)

The cumulative distribution function (CDF) for $k = 1$ is:

F (t_{μ} | μ^{'}) ≃ Φ (\sqrt{t_{μ}} - \sqrt{Λ}) + Φ (\sqrt{t_{μ}} + \sqrt{Λ}) - 1,

(8.3.33)

where $Φ$ is the CDF of the standard normal distribution. For $μ = μ^{'} \Rightarrow Λ = 0$ , again this simplifies to:

F (t_{μ} | μ) ≃ 2 Φ (\sqrt{t_{μ}}) - 1 .

(8.3.34)

From Eq. 8.2.14, we know the $p$ -value $p_{μ}$ of the observed $t_{μ}^{obs}$ under a signal hypothesis of $H_{μ}$ is

p_{μ} = 1 - F (t_{μ}^{obs} | μ) = 2 (1 - Φ (\sqrt{t_{μ}^{obs}})),

(8.3.35)

with an associated significance

Z = Φ^{- 1} (1 - p_{μ}) = Φ^{- 1} (2 Φ (\sqrt{t_{μ}^{obs}} - 1)

(8.3.36)

Application to hypothesis testing

Let’s see how well this approximation agrees with the toy-based $p$ -value we found in Example 8.2.1. For the same counting experiment example, where we expect $s = 10$ and observe $n_{obs} = 20$ , $m_{obs} = 5$ , we found the $p$ -value for testing the $μ = 1$ hypothesis $p_{μ = 1} = 0.3$ (and the associated significance $Z = 0.52$ ). Calculating $t_{μ}^{obs}$ for this example and plugging it into the asymptotic approximation from Eq. 8.3.35 gives:¹³

\begin{align} t_{μ}^{obs} = 1.08 & (8.3.37) \\ \Rightarrow p_{μ = 1} = 2 (1 - Φ (\sqrt{1.08})) = 0.3 & (8.3.38) \\ \Rightarrow Z = 0.52 . & (8.3.39) \end{align}

We see that it agrees exactly!

The agreement more generally, with varying $s, μ, n_{obs}, m_{obs}$ , is plotted in Figure 8.29. We observe generally strong agreement, except for low $n, m$ where, as expected, the asymptotic approximation breaks down.

Figure 8.29. Comparing the significances, as a function of the signal strength $μ$ of the hypothesis being tested, for simple counting experiments (Eq. 8.2.3) with different $s, n_{obs}, m_{obs}$ ’s, derived using $30, 000$ toys each (solid) to estimate the $p (t_{μ} | μ)$ distribution vs. the asymptotic approximation (dashed).

Summary

We have been able to find the asymptotic form for the profile-likelihood-ratio test statistic $t_{μ} ≃ \frac{{(μ - \hat{μ})}^{2}}{σ_{\hat{μ}}^{2}}$ , which is distributed as a non-central chi-squared ( $χ_{k}^{' 2} (Λ)$ ) distribution. We discussed two methods for finding the non-centrality parameter $Λ$ , out of which the Asimov sigma $σ_{A}$ estimation generally performed better. Finally, the asymptotic formulae were applied to simple examples of hypothesis testing to check the agreement with toy-based significances. These asymptotic formulae can be extended to the alternative test statistics for positive signals ${\tilde{t}}_{μ}$ and upper-limit-setting ${\tilde{q}}_{μ}$ , as in Ref. [301], to simplify the calculation of both observed and expected significances, limits, and intervals.

With that, we conclude the overview of the statistical interpretation of LHC results. We will see practical applications of these concepts to searches in the high energy Higgs sector in Part V.

⁵See derivations in e.g. Ref. [305].

⁶The reason for this is discussed shortly in Section 8.3.1..

⁷For a more rigorous derivation, see e.g. Ref. [308].

⁸We are able to do this because, as we saw above, the score is linear in $n$ for Poisson likelihoods.

⁹The Asimov dataset is named after Isaac Asimov, the popular science fiction author, whose book Franchise is about a supercomputer choosing a single person as the sole voter in the U.S. elections, because they can represent the entire population.

¹⁰Note: this is not a rigorous derivation; it’s just a way to motivate the final result, which is taken from Ref. [301]. (If you know of a better way, let me know!)

¹¹More information can be found in e.g. Ref. [309].

¹²More generally, we’d need ${\tilde{I}}^{- 1}$ for Eq. 8.3.24.

¹³Note that we’re using $t_{μ}$ here, not the alternative test statistic $\tilde{t_{μ}}$ ; however, in this case since $\hat{μ} > 0$ , they are equivalent.