CERN Accelerating science

Machine Learning in action: shaping the future of LHCb and fostering cross-experiment collaboration

Common challenges and the democratic nature of ML are building new bridges across experiments.

The LHCb experiment is ramping up its R&D efforts in Machine Learning (ML), while also addressing the challenges of deploying ML in production and fostering multi-experiment collaborations. Why? The experiment is already navigating the complexities of a high-luminosity environment and benefiting from a new trigger system with enhanced computational capabilities. These advancements are laying the groundwork for a more efficient and automated approach to physics exploration, which will prove essential to carry out a future detector upgrade planned for the next decade.

Entrance into the high-luminosity era

To enhance the statistical power of measurements and searches, the LHCb detector has been revamped in Upgrade I to handle five times more instantaneous luminosity. A significant addition is the GPU-enabled, fully software-based trigger, which allows complex algorithms to perform fine reconstruction and selection very early in the data collection process. However, increased object multiplicities often demand manual speed-ups of algorithms to meet the real-time processing constraints. In addition, larger event sizes and limited storage space pose unprecedentedly strong constraints to the bandwidth. Reducing the bandwidth without compromising LHCb’s physics program requires optimizing a vast array of thousands of interconnected physics-object selections—a highly labor-intensive task.

Further downstream in the data collection process, Data Quality Monitoring (DQM) becomes critical during the commissioning of a new detector, requiring great human effort to spot and communicate problems as fast as possible while adapting to frequent changes in operation conditions. When it comes to physics exploration, as data samples grow, producing matching large simulation samples with standard techniques becomes unfeasible within budgeted CPU time, necessitating faster, highly accurate simulation methods. These challenges will intensify in the future Upgrade II, where luminosity is set to increase tenfold. Modern ML techniques offer timely and powerful ways to address those challenges by increasing overall performance and automation.

ML-assisted trigger and simulation in Run 3

The LHCb trigger system has evolved significantly since the early use of simple Boosted Decision Tree (BDT) classifiers in Run 1. Today, advanced Lipschitz neural networks with built-in robustness and monotonicity guarantees are fully deployed at multiple levels of the Run 3 trigger for both reconstruction and selection tasks. On the simulation side, the LAMARR project offers an ultra-fast alternative to standard Geant4 simulation and reconstruction, using a pipeline of ML-assisted modules that achieve a speed up of two orders of magnitude. These modules utilize BDTs, Multilayer Perceptrons (MLPs), and Generative Adversarial Networks (GANs). As more ML models enter production, new challenges arise, including model bookkeeping, effective training, continuous testing, and long-term maintainability. To address these issues, two strategies are being pursued: first, the exploration of standardized and flexible ML inference libraries like ONNXRuntime and TensorRT as alternatives to hardcoded implementations. Second, the development of generic interfaces and pipelines for model inference and for model training to streamline integration and maintenance. Given these challenges are shared across LHC collaborations, discussions are underway about creating cross-experiment solutions. A good example in this direction is an existing prototype ML inference interface developed by LHCb people for the experiment-independent Gaussino simulation framework.

Figure 1: trigger throughput studies using the TensorRT ML library for GPU. Find details in this LHCb figure document.

Figure 2: simulation throughput studies comparing different ML libraries within a general interface for ML inference in Gaussino. Find details in this conference talk.

What could be the next evolution stage for the trigger?

Looking ahead, R&D efforts for the trigger are focused on improving algorithm performance and speed to manage larger particle multiplicities per event. There is also a shift towards end-to-end ML solutions, which offer greater automation and simplify future software development. Graph Neural Networks (GNNs) show significant promise in event reconstruction and filtering, particularly for extracting patterns from complex collision data. They are being explored for tasks like track reconstruction in the vertex locator and primary vertex (PV) reconstruction, the latter in collaboration with ATLAS physicists. GNNs are also being investigated as a comprehensive solution for event-size reduction, effectively identifying and reconstructing heavy-hadron decay chains while eliminating pile-up, leading to bandwidth optimization in high-multiplicity environments. The next phase of those projects aims to accelerate GNN models to meet future trigger processing rate requirements. Additionally, anomaly detection techniques are being developed to utilize the new Run 3 trigger capabilities, especially for identifying Long-Lived Particle (LLP) showers in the LHCb muon system. Normalized Autoencoders (NAEs) have proven to perform excellently in simulations, and trigger selections based on them are under development.

Figure 3: R&D project for a one-go event interpretation at trigger level, reconstructing all signal-like decay chains and removing pile-up. Bandwidth can be minimised by storing only the information of decays of interest. Find details in this paper.

Can ML agents help in the control room?

LHCb is exploring the automation of various aspects of the DQM process using ML. A key approach under investigation is the use of Reinforcement Learning (RL), with initial proof-of-concept studies showing significant potential. This method offers two main advantages: it can adapt to changing conditions via continuous training, and it can optimize and automate control room tasks beyond a simple classification of normal/anomalous data, balancing data collection efficiency with operational expenses, including the amount of human work required. Though still in its early stages, the use of RL for DQM has already attracted interest from physicists in the CMS and ALICE collaborations for potential future use, which would be a good addition to the existing ML solutions currently used for DQM in the experiments.

Figure 4: Performance over time of an RL algorithm trained to classify synthetic data as normal or anomalous, requesting human feedback only when necessary. The graphs show (left) accuracy and (right) the fraction of the time a shifter is called. The change in response after 1000 episodes, following an abrupt shift in data conditions, matches the ideal behaviour expected. Find details in this paper.

What is next on the simulation and physics exploration fronts?

In the LAMARR project, ongoing efforts focus on achieving highly accurate simulations of reconstructed photons and electrons, with models based on transformers and GNNs under investigation. In addition to expanding collision and simulated datasets, LHCb physicists are committed to improving and streamlining data analysis procedures using ML techniques to achieve the highest possible precision. Beyond numerous innovative solutions pursued by specific analysis groups, not covered in this article, two general R&D projects stand out: one aims to enhance the experiment’s flavour tagging power using DeepSets, and the other seeks to improve particle identification efficiency in less-studied regions of signal-decay phase space using domain-adaptation techniques.

Multi-experiment alliances to pave the way

This is a pivotal moment in history as multiple LHC experiments face converging needs in adapting to demanding high-luminosity conditions through the use of ML techniques. Cross-experiment solutions hold great potential by optimizing person-power allocation, sharing complementary expertise, and maximizing algorithm reusability. As discussed in this article, promising paths forward include developing generic interfaces and pipelines for ML model training and inference, sharing ML-based approaches (such as RL for DQM), and collaborating directly for specific applications (like the LHCb-ATLAS partnership for PV reconstruction). These new bridges could be crucial in fully unlocking the future potential of the LHC experiments.

 

Further information

Seminar expanding on these topics: CERN EP-IT Data Science Seminar on 10/07/24.

The LHCb Upgrades: Upgrade I, Upgrade II.

ML in Run 3: Lipschitz networks in the trigger, LAMARR project, ML inference libraries in the trigger, ML inference and training pipelines in the trigger, ML inference interface in simulation.

ML Research & Development efforts: