11 Validating and comparing fast simulations

Chapter 11
Validating and comparing fast simulations

In this Part, we have discussed the development of fast DL simulators to tackle the critical problem of producing efficient and high-quality simulations in the HL-LHC. In particular, we have introduced the MPGAN, GAPT, and iGAPT models, the first to effectively simulate point clouds in HEP, which have demonstrated promising results in both speed and quality. However, for an experimental collaboration to apply one of these techniques in real data analyses, we require methods to objectively compare the performance of different simulation techniques and extensively validate the produced simulations. This calls for the study and adoption of standard quantitative evaluation metrics for generative modeling in HEP.

This chapter presents the first, to our knowledge, systematic investigation of generative evaluation metrics’ sensitivity to expected failure modes of generative models, and their relevance to validation and feasibility for broad adoption in HEP. We study the performance of several proposed metrics from HEP and computer vision and, inspired by both domains, we develop two novel metrics we call the Fréchet and kernel physics distances (FPD and KPD, respectively). We find them to collectively have excellent sensitivity to all tested data mismodeling, as well as to satisfy practical requirements for evaluation and comparison of generative models in HEP.

We conclude our experiments by recommending the adoption of FPD and KPD, along with quantifying differences in individual feature distributions using the Wasserstein 1-distance, and demonstrate their use in evaluating the MPGAN and GAPT-based models. Implementations for the new metrics are provided in the JetNet library [336].

This section is structured as follows. In Section 11.1 we define our criteria for evaluation metrics in HEP and review existing metrics. We present results on the performance of these metrics on Gaussian-distributed synthetic toy data and simulated high energy jets in Sections 11.2 and 11.3 respectively. Based on these experiments, we provide our recommendations and concretely illustrate their application by evaluating and comparing the aforementioned models discussed in Section 11.4. Finally, we conclude in Section 11.5.

11.1 Evaluation metrics for generative models
11.2 Experiments on gaussian-distributed data
11.3 Experiments on jet data
11.4 Demonstration on particle cloud GANs
11.5 Summary

Chapter 11Validating and comparing fast simulations

Chapter 11
Validating and comparing fast simulations