Expected Significances and Limits

5. Expected Significances and Limits#

5.1. Recap#

In Chapter 2, we defined our probability model and likelihood function for a simple one bin counting experiment with $n$ events, $s$ expected signal (for a signal strength $\mu$ of 1), and $b$ expected background in our signal region, and $m$ events, $b$ expected background, in our control region:

(5.1)#\[ P(n, m; \mu, b) = \frac{(\mu s+b)^n e^{-(\mu s+b)}}{n!} \cdot \frac{b^m e^{-b}}{m!} = L(\mu, b), \]

and our test statistic $\tilde{t}_\mu$:

(5.2)#\[\begin{split} \tilde{\lambda}(\mu) = \left\{ \begin{array}{ll} \frac{L(\mu, \hat{\hat{b}}(\mu))}{L(\hat{\mu}, \hat{b})}, & \mbox{$\hat{\mu}\geq0$}.\\ \frac{L(\mu, \hat{\hat{b}}(\mu))}{L(0, \hat{\hat{b}}(0))}, & \mbox{$\hat{\mu}<0$}. \end{array} \right., \quad \tilde{t}_\mu=-2\ln \tilde{\lambda}(\mu), \end{split}\]

and solved for $\hat{\mu}, \hat{b}, \hat{\hat{b}}(\mu)$ for given $n, m$ to find an analytic form for $t_\mu$.

In Chapter 3, we translated this to a probability distribution of $\tilde{t}_\mu$ under a particular signal hypothesis $H_\mu$, and quantified the compatibility of our observation based on the $p$-value $p_\mu$ of the observed test statistic $\tilde{t}_\mu^\mathrm{obs}$:

(5.3)#\[ p_\mu = \int_{\tilde{t}_\mathrm{obs}}^{\infty}p(\tilde{t}_\mu|\mu)\mathrm d \tilde{t}_\mu, \]

and it’s associated significance.

In Chapter 4, we extended this to deriving confidence intervals for parameters of interest by solving for a particular $p$-value:

(5.4)#\[ p = 1 - \mathrm{CL} = \int_{\tilde{t}^\mathrm{obs}_\mu}^{\infty}p(\tilde{t}_\mu|\mu_\pm)\mathrm d \tilde{t}_\mu. \]

We also defined a new test statistic $\tilde{q}_\mu$ for upper limits:

(5.5)#\[\begin{split} \tilde{q}(\mu) = \left\{ \begin{array}{ll} \tilde{t}(\mu), & \mbox{$\hat{\mu} < \mu$}.\\ 0, & \mbox{$\hat{\mu} \geq \mu$}. \end{array} \right. = \left\{ \begin{array}{ll} -2\ln\frac{L(\mu, \hat{\hat{b}}(\mu))}{L(0, \hat{\hat{b}}(0))}, & \mbox{$\hat{\mu}<0$}.\\ -2\ln\frac{L(\mu, \hat{\hat{b}}(\mu))}{L(\hat{\mu}, \hat{b})}, & \mbox{0 $\leq \hat{\mu} < \mu$}.\\ 0, & \mbox{$\hat{\mu} \geq \mu$}. \end{array} \right. \end{split}\]

and introduced the CL$_s$ criterion used in HEP in lieu of the standard $p$-value to avoid excluding signals for which our experiment lacks sensitivity:

(5.6)#\[ p_\mu' = \cfrac{p_\mu}{1 - p_b}, \]

Code from previous chapters:

Show code cell source Hide code cell source

def get_toys_sb(s, b, num_toys):
    """Generate toy data for a given s and b"""
    # sample n, m according to our data model (Eq. 1)
    n, m = poisson.rvs(s + b, size=num_toys), poisson.rvs(b, size=num_toys)
    return n, m


def get_toys(s, n_obs, m_obs, num_toys):
    """Generate toy data for a given s and observed n and m"""
    #  use b^^ for p(t_s|s) as recommended by Ref. 2
    b = bhathat(s, n_obs, m_obs)
    return get_toys_sb(s, b, num_toys)


def get_p_ts(test_s, n_obs, m_obs, num_toys, toy_s=None):
    """
    Get the t_tilde_s test statistic distribution via toys.
    By default, the s we're testing is the same as the s we're using for toys,
    but this can be changed if necessary (as you will see later).
    """
    if toy_s is None:
        toy_s = test_s
    n, m = get_toys(toy_s, n_obs, m_obs, num_toys)
    return t_tilde_s(test_s, n, m)


def get_ps_val(test_s, n_obs, m_obs, num_toys, toy_s=None):
    """p value"""
    t_tilde_ss = get_p_ts(test_s, n_obs, m_obs, num_toys, toy_s)
    t_obs = t_tilde_s(test_s, n_obs, m_obs)
    p_val = np.mean(t_tilde_ss > t_obs)
    return p_val, t_tilde_ss, t_obs


def get_p_qs(test_s, n_obs, m_obs, num_toys, toy_s=None):
    """
    Get the q_tilde_s test statistic distribution via toys.
    By default, the s we're testing is the same as the s we're using for toys,
    but this can be changed if necessary (as you will see later).
    """
    if toy_s is None:
        toy_s = test_s
    n, m = get_toys(toy_s, n_obs, m_obs, num_toys)
    return q_tilde_s(test_s, n, m)


def get_pval_qs(test_s, n_obs, m_obs, num_toys, toy_s=None):
    """p value"""
    q_tilde_ss = get_p_qs(test_s, n_obs, m_obs, num_toys, toy_s)
    q_obs = q_tilde_s(test_s, n_obs, m_obs)
    p_val = np.mean(q_tilde_ss > q_obs)
    return p_val, q_tilde_ss, q_obs


def get_limits_CLs(s_scan: list, n_obs: int, m_obs: int, num_toys: int, CL: float = 0.95):
    p_cls_scan = []  # saving p-value for each s

    for s in s_scan:
        p_mu, t_tilde_sb, t_obs = get_ps_val(s, n_obs, m_obs, num_toys)
        p_b, t_tilde_sb, t_obs = get_ps_val(s, n_obs, m_obs, num_toys, toy_s=0)
        p_b = 1 - p_b
        p_cls_scan.append(p_mu / (1 - p_b))

    # find the values of s that give p_value ~= 1 - CL
    pv_cl_diff = np.abs(np.array(p_cls_scan) - (1 - CL))
    half_num_s = int(len(s_scan) / 2)
    s_low = s_scan[np.argsort(pv_cl_diff[:half_num_s])[0]]
    s_high = s_scan[half_num_s:][np.argsort(pv_cl_diff[half_num_s:])[0]]

    return s_low, s_high


def get_upper_limit_CLs(s_scan: list, n_obs: int, m_obs: int, num_toys: int, CL: float = 0.95):
    p_cls_scan = []  # saving p-value for each s

    for s in s_scan:
        p_mu, q_tilde_sb, q_obs = get_pval_qs(s, n_obs, m_obs, num_toys)
        p_b, q_tilde_sb, q_obs = get_pval_qs(s, n_obs, m_obs, num_toys, toy_s=0)
        p_b = 1 - p_b
        p_cls_scan.append(p_mu / (1 - p_b))

    # find the values of s that give p_value ~= 1 - CL
    pv_cl_diff = np.abs(np.array(p_cls_scan) - (1 - CL))
    s_upper = s_scan[np.argsort(pv_cl_diff)[0]]

    return s_upper

5.2. Expected Significance#

The focus so far has been only on evaluating the results of experiments. However, it is equally important to characterize the expected sensitivity of the experiment before running it (or before looking at the data).

Concretely, let’s continue with our simple example and say we expect $b = 10$ background events and, for a signal strength $\mu = 1$, $s = 10$ signal events. How do we tell if this experiment is at all useful for discovering this signal, i.e., does it have any sensitivity to the signal? One way is to calculate the significance with which we expect to exclude the background-only hypothesis if the signal were, in fact, to exist.

Practically, this means we are testing $H_0$ and, hence, need $p(\tilde{t}_0|\mu = 0)$ as before. However, now we also need the distribution of the test statistic $\tilde{t}_0$ under the background + signal hypothesis $p(\tilde{t}_0|\mu = 1)$. Then, by calculating the significance for each sampled $\tilde{t}_0$ under $H_{\mu = 1}$, we can estimate the distribution of expected significances.

As always, things should hopefully be clearer with code:

num_toys = 30000
# generate toys under signal + background hypothesis
s, b = 10, 10
n, m = get_toys_sb(s, b, num_toys)
# sample t_0 under s + b hypothesis
t_tilde_0s = t_tilde_s(0, n, m)
median_t0s = np.median(np.nan_to_num(t_tilde_0s, np.inf))

# generate toys under background-only hypothesis
n, m = get_toys_sb(0, b, num_toys)
# sample t_0 under b-only hypothesis
t_tilde_00 = t_tilde_s(0, n, m)

# convert sampled t_0s under s + b hypothesis into p-values wrt t_0s under b-only hypothesis
# by calculating the p-value for each t_0 under the s + b hypothesis
p_values = np.mean(t_tilde_00[:, np.newaxis] >= t_tilde_0s, axis=0)
# convert to significances
Zs = np.minimum(norm.ppf(1 - p_values), 5)
median_Z = round(np.median(Zs), 2)

_images/1529bee7249e4a981b6f4aeb399d6017ca778086e18cd1b1c9520e1c7d51c634.png — Fig. 5.1 Left: Distributions of $\tilde{t}_0$ under the background-only and signal hypotheses using 30,000 toys each. The median of the latter is marked in red. Right: Distribution of the significances (with respect to the background-only hypothesis) of each sampled $\tilde{t}_0$ under the signal hypothesis.#

Important

We usually quote the median of this distribution as the expected significance, since the median is “invariant” to monotonic transformations (i.e., the median $p$-value will always correspond to the median $Z$ as well, whereas the mean $p$-value will not correspond to the mean $Z$). Similarly, we quote the 16%/84% and 2%/98% quantiles as the $\pm 1\sigma$ and $\pm 2\sigma$, respectively, expected significances.

In this case, we find the median expected significance to be .

Note

Note that instead of converting each sampled $\tilde{t}$ under $H_{\mu=1}$ into a significance and finding the median of that distribution, as in Fig. 5.1 (right), we could have directly used the significance of the median $\tilde{t}$ under $H_{\mu=1}$ (Fig. 5.1, left). We will do this below for the expected limit.

5.3. Expected Limit#

The other figure of merit we care about in searches is the upper limit set on the signal strength. To derive the expected limit, we do the opposite of the above and ask, if the signal were not to exist, what value of $\mu$ would we expect to exclude at the 95% confidence level.

This means we need:

The distribution $p(\tilde{q}_\mu|\mu)$ as in Part 3 to solve for $\mu^+$ in Eq. (5.4) and be able to do the upper limit calculation;
$p(\tilde{q}_\mu|0)$ (or, as we’ll see, just its median / quantiles) to get the expected $\tilde{q}_\mu^\mathrm{obs}$ for different signal strengths ($\mu$) under the background-only hypothesis: and, furthermore,
To scan over the different signal strengths to find the $\mu$ that results in a median $p$-value of 0.05 (or rather $p_\mu'$ value, from Eq. (5.6), since we’re using the CL$_s$ method for upper limits).

First, let’s look at the first two steps for just the $\mu = 1$ signal strength, where we expect $s = 10, b = 10$ signal and background events like above. This is similar to, and essentially an inversion of, the procedure for the expected significance: we’re now finding the median expected $p_{\mu=1}'$-value with respect to the signal + background hypothesis, for test statistics $\tilde{q}_\mu$ sampled under the background-only hypothesis.

num_toys = 30000
exp_s, exp_b = 10, 10
# generate toys under signal + background hypothesis
n, m = get_toys_sb(exp_s, exp_b, num_toys)
# sample q under s + b hypothesis
q_tilde_ss = q_tilde_s(exp_s, n, m)

# generate toys under background-only hypothesis
n, m = get_toys_sb(0, exp_b, num_toys)
# sample t_0 under b-only hypothesis
q_tilde_s0 = q_tilde_s(exp_s, n, m)
median_qs0 = np.median(np.nan_to_num(q_tilde_s0, np.inf))

# CL_s
p_mu = np.mean(q_tilde_ss >= median_qs0)
p_b = 0.5  # we're looking at the median of p(q_mu | 0) so, by definition, this is 0.5
p_muprime = round(p_mu / (1 - p_b), 3)

_images/41ce8662d91746d10cda29e5bc99d71077519255d6d9cb534686fab14bab91d1.png — Fig. 5.3 Calculating the median expected $p_{\mu=1}'$-value with respect to the signal + background hypothesis, for test statistics $ ilde{q}_\mu$ sampled under the background-only hypothesis. p($ ilde{q}_\mu$|1) and p($ ilde{q}_\mu$|0) are estimated using 30,000 toys each. Then, the median p($ ilde{q}_\mu$|0) (red) is used to calculate the $p_{\mu}'$-value following the CL$_s$ criterion (from Part 3).#

The difference with respect to calculating the expected significance is step 3, where this procedure has to be repeated for a range of signal strengths to find the value which gives a median (and $\pm1\sigma,\pm2\sigma$ quantile) $p_\mu'$ of 0.05 - this is thus the minimum value of $\mu$ that we expect to be able to exclude at 95% CL. This is shown below:

num_toys = 50000
exp_s, exp_b = 10, 10

# saving p_muprime for each mu for each relevant quantile
quantiles = np.array([0.02, 0.16, 0.5, 0.84, 0.98])
p_muprimes = []

mus = np.arange(0.4, 2, 0.01)
for mu in mus:
    s = mu * exp_s
    # generate toys under signal + background hypothesis
    n, m = get_toys_sb(s, exp_b, num_toys)
    # sample q under s + b hypothesis
    q_tilde_ss = q_tilde_s(s, n, m)

    # generate toys under background-only hypothesis
    n, m = get_toys_sb(0, exp_b, num_toys)
    # sample t_0 under b-only hypothesis
    q_tilde_s0 = q_tilde_s(s, n, m)

    qs0_quantiles = np.quantile(np.nan_to_num(q_tilde_s0, np.inf), quantiles)

    # CL_s
    p_mu = np.mean(q_tilde_ss[:, np.newaxis] >= qs0_quantiles, axis=0)
    p_b = quantiles
    p_muprimes.append(p_mu / (1 - p_b))

p_muprimes = np.array(p_muprimes)
# find the mu which gives the closest value to 0.05 for each quantile
limits = mus[np.argmin(np.abs(p_muprimes - 0.05), axis=0)]

_images/5e1e8192ae705d04a7525b691e641b4c0388b4e67b008636783b6ea2c61661e1.png — Fig. 5.4 Left: The expected median and $\pm1\sigma,\pm2\sigma$ quantiles of $p_\mu'$ for different $\mu$’s. The intersection of these with $p_\mu' = 0.05$ (grey) corresponds to the expected exclusion limits. Right: The median and $\pm 1\sigma, \pm 2\sigma$ expected limits at 95% CL$_s$ on $\mu$.#

Thus, we have our expected limits. The plot on the right is colloquially known as a “Brazil-band plot”, and is the standard way of representing limits. For example, below is the corresponding plot by ATLAS for the Higgs discovery (scanning over the Higgs mass):

_images/atlas_limits.png — Fig. 5.5 Expected and observed 95% CL$_s$ upper limits for the SM Higgs by ATLAS in 2012, for different hypothetical Higgs masses [5].#

Note

One subtlety we skipped over is that we may not have expected values for the nuisance parameters, like $b$, beforehand. In practice, there are a lot of methods developed for estimating these using simulations, or observed data in control regions (e.g. from observing only $m$ in our case), etc., which generally depend on the specific analysis being performed.

5.4. Summary#

We derived the “expected significance” of a particular signal model by calculating the significance with which we expect to exclude the background-only hypothesis, assuming the signal exists. We then derived CL$_s$ upper limits we expect to set for a signal model, assuming the signal does not exist. In both cases, we generated toys to estimate the probability distributions of our test statistics; however, in Part II, we will go over asymptotic forms for these test statistics, which can often be simpler to use.