CERN Accelerating science

First analysis of Run 3 data with a new slim data format

The ATLAS Collaboration has just released a new measurement of the production cross-section of two Z bosons. This result examines data collected during Run 3 of the LHC – with protons colliding at a record energy of 13.6 TeV – and pioneers the use of PHYSLITE – a new, reduced data format that requires significantly less storage and which is expected to become the standard for the analysis of data from Run 4 of the LHC (HL-LHC), as computing resources will be further limited due to the vast amount of data expected.

The integrated luminosity expected in Run 4 and Run 5 is 270 fb−1 and 350 fb−1 per year, respectively. The average pile-up rises from 60 interactions now in Run 3 to about 140 and finally 200 in Runs 4 and 5. These parameters pose a challenging problem for computing, as budgets and resources won’t scale accordingly. To remedy this situation, CPU, and storage needs have to be reduced, but without compromising the multiplicity and quality of the physics output. ATLAS has launched a strong R&D program to address the HL-LHC software and computing challenges, and PHYS as well as PHYSLITE have been two of the major developments.

PHYS is the main format being used in Run 3, serving about 80% of all physics analyses. It is considered a reduced data format suitable for 80% of all physics analyses, with a target size of 30 kB (50 kB) per event for data (MC). PHYS contains information about objects and trigger, as well as a thinned track collection, generator-level information, and additional data required for applying calibration tools and evaluating systematic uncertainties PHYSLITE is a reduced format, storing only the most common physics objects that are also already calibrated and preselected. PHYSLITE is about a factor three smaller than PHYS, and it is also faster to process – making it an attractive format for standard physics analyses. Preliminary evaluations of the performance of PHYSLITE analyses compared to PHYS resulted in approximately 25% CPU reduction. The target file size of PHYSLITE is 10 (12) kB per event for data (MC).

In a recent analysis, ATLAS researchers focused on the production of two Z bosons (ZZ) decaying into four leptons (electrons or muons). The data were collected by the ATLAS detector in 2022, and correspond to an integrated luminosity of 29 fb−1. In their study of the ZZ process, researchers measured the production cross-section within a specific kinematic phase space to be 36.7 ± 2.3 fb. Assuming Standard Model decay rates, the fiducial cross-section was extrapolated to obtain the total cross-section of 16.8 ± 1.1 pb. The results are well described by Standard-Model predictions (see Figure 1). Researchers also measured the differential cross-section of the invariant mass and the transverse momentum of the four-lepton system (see Figure 2). All of these measurements enabled physicists to compare their experimental results with theoretical predictions in a model-independent way.

Figure 1: The measured fiducial (a) and total (b) cross-sections compared to the predictions from simulations. (Image: ATLAS Collaboration/CERN).


Figure 2: The differential cross-sections (black points) measured for the invariant mass (left) and the transverse momentum (right) of the four-lepton system. These are compared to Standard-Model predictions (coloured markers). (Image: ATLAS Collaboration/CERN).

These measurements not only test the electroweak sector of the Standard Model at the highest available energies but also provide an opportunity to further verify the functionality of the ATLAS detector and its reconstruction software. It is one of the first measurements (in total there were three) that looks at Run 3 data with an upgraded detector, in particular for lepton trigger and muon reconstruction in the forward region, and with an improved reconstruction software and analysis model.

This is the first result to use the PHYSLITE data format and the new analysis model implemented by the ATLAS Collaboration for Run 3 and beyond. This new analysis model does away with the huge number of custom data formats that used to be tailored to specific physics cases. The consolidation of the new format significantly reduces the computing resources needed for ATLAS analyses. For example, in the case of the ZZ measurement, PHYSLITE data samples for the size per event are about a factor 3 smaller compared to the Run 2 samples, which contributed significantly to the fast turnaround of this result. Being such a small format, PHYSLITE can be stored on a single grid site, thus reducing network traffic. Moreover, it is a dynamic format, meant to evolve to support physics analyses optimally.

The use of PHYSLITE in Run 3 analyses provides valuable input for its future deployment. It provides a considerable improvement in analysis model and the optimal use of the ATLAS computing resources. Moreover, this measurement marks an important milestone towards the Run 3 physics program and the first time measurement of diboson productions at the new 13.6 TeV centre-of-mass energy. This success was celebrated in the recent ATLAS week during a special ceremony honoring the PHYSLITE developers. 


The author would like to thank Stephane Willocq (University of Massachusetts-Amherst) and Fabio Cerutti (BNL) for their thoughtfull comments.