Model details

E.1 Model details

E.1.1 LGAE

For both the encoder and decoder, we choose $N_{MP}^{E} = N_{MP}^{D} = 4$ LMP layers. The multiplicity per node in each LMP layer has been optimized to be

{{(τ_{(m, n)}^{(t)})}^{E}}_{t = 1}^{4} = (3, 3, 4, 4)

(E.1.1)

for the encoder and

{{(τ_{(m, n)}^{(t)})}^{D}}_{t = 1}^{4} = (4, 4, 3, 3)

(E.1.2)

for the decoder, the components in the vector on the right-hand side are the multiplicity in each of the four LMP layers per network, and the multiplicity per layer is the same for all representations. After each CG decomposition, we truncate irreps of dimensions higher than $(1 ∕ 2, 1 ∕ 2)$ for tractable computations, i.e., after each LMP operation we are left with only scalar and vector representations per node. Empirically, we did not find such a truncation to affect the performance of the model. This means that the LMP layers in the LGAE are similar in practice to those of LorentzNet, which uses only scalar and vector representations throughout, but are more general as higher dimensional representations are involved in the intermediate steps before truncation.

The differentiable mapping $f (d_{ij})$ in Eq. 16.2.1 is chosen to be the Lorentzian bell function as in Ref. [54]. For all models, the latent space contains only $τ_{(0, 0)} = 1$ complex Lorentz scalar, as we found increasing the number of scalars beyond one did not improve the performance in either reconstruction or anomaly detection. Empirically, the reconstruction performance increased with more latent vectors, as one might expect, while anomaly detection performance generally worsened from adding more than two latent vectors.

E.1.2 GNNAE

The GNNAE is constructed from fully-connected MPNNs. The update rule in the $(t + 1)$ -th MPNN layer is based on MPGAN’s (Section 10.1), and given by

\begin{align} m_{i}^{(t)} & = \sum_{j = 1}^{n} f_{e}^{(t)} (x_{i}^{(t)} \oplus x_{j}^{(t)} \oplus d (x_{i}^{(t)}, x_{j}^{(t)})), & (E.1.3) \\ x_{i}^{(t + 1)} & = f_{n}^{(t)} (x_{i}^{(t)} \oplus m_{i}^{(t)}), & (E.1.4) \end{align}

where $x_{i}^{(t)}$ is the node embedding of node $i$ at $t$ -th iteration, $d$ is any distance function (Euclidean norm in our case), $m_{i}^{(t)}$ is the message for updating node embedding in node $i$ , $f_{e}^{(t + 1)}$ and $f_{n}^{(t + 1)}$ are any learnable mapping at the current MP layer. A diagram for an MPNN layer is shown in Figure E.1. The overall architecture is similar to that in Figure 16.1, with the LMP replaced by the MPNN. The code for the GNNAE model can be found in the Ref. [456].

Figure E.1. An MPNN layer in the GNNAE. Here, $EdgeNet$ and $NodeNet$ are feed-forward neural networks.

For both the encoder and decoder, there are $3$ MPNN layers. The learnable functions in each layer are optimized to be

\begin{matrix} \begin{aligned} f_{n}^{(1)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{30 \to 15}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{60 \to 30}) \\ f_{e}^{(1)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{40 \to 30}), \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{50 \to 40}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{61 \to 50}), \end{aligned} \end{matrix}

(E.1.5)

\begin{matrix} \begin{aligned} f_{n}^{(2)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{15 \to 8}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{45 \to 15}) \\ f_{e}^{(2)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{31 \to 30}), \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{30 \to 30}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{30 \to 30}), \end{aligned} \end{matrix}

(E.1.6)

\begin{matrix} \begin{aligned} f_{n}^{(3)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{8 \to δ}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{38 \to 8}) \\ f_{e}^{(3)} & = ({LeakyReLU}_{0.2} \circ {Linear}_{20 \to 30}), \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{16 \to 20}) \\ \circ ({LeakyReLU}_{0.2} \circ {Linear}_{17 \to 16}), \end{aligned} \end{matrix}

(E.1.7)

where ${LeakyReLU}_{0.2} (x) = \max (0.2 x, x)$ is the LeakyReLU function. Depending on the aggregation layer, the value of $δ$ in $f_{n}^{(3)}$ and the final aggregation layer is different. For GNNAE-JL encoders, $δ = N \times \dim (L)$ , where $L$ is the latent space, and $N$ is the number of nodes in the graph. Then, mean aggregation is done across the graph. For GNNAE-PL encoders, $δ = d$ , where $d$ is the node dimension in the latent space. In the GNNAE-JL decoder, the input layer is a linear layer that recovers the particle cloud structure similar to that in the LGAE.

E.1.3 CNNAE

The encoder is composed of two convolutional layers with kernel size $(3, 3)$ , stride size $(2, 2)$ , “same" padding, and $128$ output channels, each followed by a ReLU activation function. The aggregation layer into the latent space is a fully-connected linear layer. The decoder is composed of transposed convolution layers (also known as deconvolutional layers) with the same settings as the encoder. A softmax function is applied at the end so that the sum of all pixel values in an image is $1$ , as a property of the jet image representation. A 55-dimensional latent space is chosen so that the compression rate is $55 ∕ 90 \approx 60 %$ for even comparisons with the LGAE and GNNAE models.