CERN Accelerating science

CLUE: A Scalable Clustering Algorithm for the Data Challenges of Tomorrow

The foreseen increase in luminosity and pileup at the High-Luminosity Large Hadron Collider (HL-LHC) [1] will challenge both the detector hardware and the reconstruction software. With higher pile-up, calorimeter energy reconstruction depends even more on pattern-recognition algorithms. These algorithms have to cluster together deposits produced by the same parent particle while rejecting any particle cross-contamination and noise as soon as possible.

The algorithm responsible for clustering calorimeter hits together must be both accurate and efficient, capable of handling the enormous number of hits recorded per event, and scalable to modern computing architectures.

Traditional clustering algorithms often fall short in this context. Their strictly sequential logic struggles with modern, data-heavy events and maps poorly to GPUs. Moreover, most ignore the physical weight of each hit, its deposited energy, which could guide clustering.

These limitations led to the development of CLUE (CLUstering of Energy) [2], an algorithm tailored to the unique requirements of high-granularity calorimeters. CLUE is a density-based, weighted clustering algorithm: it determines clusters based on both the spatial density of points and their associated energy values.

As shown in Figure 1, the algorithm proceeds in four main steps. First, it computes the local energy density for each hit by searching for nearby hits and weighting them according to their distance and energy. A hit that is surrounded by many energetic neighbours will have a higher energy density.

Next, each point is linked to the closest point with a greater local density within a maximum distance (nearest-higher). Subsequently, points without any nearest-higher are classified: those with a density above a given threshold become seeds, which act as cluster centres, while the others are marked as outliers. Finally, clusters are built by linking each point to its nearest-higher, leaving outliers not attached to any cluster.

CLUE EPnews 1

Figure 1. Visualisation of the main steps of the CLUE algorithm: A. Initial point cloud. B. Local density is estimated for each point based on its nearby neighbours. C. Each point is linked to its nearest-higher, namely the closest point with higher density. D. Points without a nearest-higher are classified: those above a density threshold become seeds (cluster centres), while the rest are labelled as outliers. E. Clusters are formed by linking each point to its nearest-higher in an iterative process.

Each step is highly parallelisable and scales efficiently with the number of points, making CLUE well-suited for GPUs. Moreover, the data structures used by CLUE allow the clustering space to be partitioned efficiently, enabling near-linear computational complexity.

CLUE was originally developed for the High Granularity Calorimeter (HGCAL) [3], which will replace the CMS endcap calorimeters during LS3 [4]. HGCAL is composed of multiple layers of silicon and scintillator sensors that provide excellent spatial resolution. CLUE is first applied within each layer to form 2D clusters, as shown in Figure 2, significantly reducing complexity with minimal loss of information. A 3D version of the algorithm is then used to combine these 2D clusters into complete showers across layers.

CLUE EP news CERN 1

Figure 2. Example of the clustering of HGCAL hits on one layer with the 2D version of CLUE. On the left, a heatmap of the hits produced by two nearby photons of 100 GeV. On the right, clusters identified by CLUE where each different color corresponds to a different cluster.

The CLUEstering library [5] was developed as a generic package that can be used in any High Energy Physics experiment and in fields beyond ours. Within the EP R&D programme, CLUEstering has been integrated into the Key4HEP [6] software stack within the Future Circular Collider (FCC) [7] project as a detector-agnostic algorithm, allowing its use across a wide range of simulated detector geometries. An example of how it can be used on 4D information (x, y, z, time) is shown in Figure 3.

CLUE EP news 2

Figure 3. On the left: Two overlapping photons simulated in the CLIC detector, separated in time by 500 picoseconds. The dashed lines represent the simulated trajectories of the two photons. In the center: A single cluster reconstructed using CLUEstering using only the 3D spatial information. On the right: CLUEstering running in four dimensions (x, y, z, t), using the additional time information, is able to separate the two particles.

Beyond high-energy physics, CLUEstering has also been applied in other fields, including the identification of forests by clustering of trees from satellite images, the identification of star clusters in astronomical images captured by CCD telescopes, and for the nearest neighbour search in latent space in machine learning applications [8]. These applications used the publicly available Python version of the CLUEstering library.

Thanks to its high level of parallelism and optimisation for GPUs, CLUE demonstrates excellent performance scaling. As shown in Figure 4, its execution time increases nearly linearly with the number of input points. The benchmarks show that the GPU can achieve a speed-up of about two orders of magnitude with respect to the CPU. Estimated ranges of occupancy for HL-LHC and FCC-hh are also indicated.

CLUE 1 EPnews

Figure 4. Plots showing the performance of the CLUEstering library. The benchmarks use realistic data composed of an increasing number of 3D clusters generated in the volume of a CMS-like calorimeter endcap. Each cluster has an average of 150 hits. The plot on the left shows a comparison of the execution time for the CPU single-threaded and the GPU versions of the library. The plot on the right shows the throughput achieved by launching multiple clustering executions concurrently under two configurations. In the first case (Full CPU), the machine is saturated by running enough concurrent clustering jobs to fully utilize 128 CPU cores. In the second case (Full CPU + GPU), again, the machine is saturated using all 128 CPU cores with concurrent clustering jobs, and, on top of that, additional clustering executions are offloaded to the GPU, using 8 concurrent streams. This setup has been designed to maximize the utilization of the entire machine and to demonstrate the potential of GPU offloading. Both benchmarks were performed on a system equipped with an AMD EPYC 9754 CPU [9] and an NVIDIA Tesla T4 GPU [10].

In conclusion, combining density-based logic with energy weighting, CLUE offers a highly efficient and scalable solution for large and complex multidimensional datasets. It is a powerful tool not only for high-energy physics experiments like those at the LHC, but also for a wide range of data-intensive scientific applications where speed, accuracy, and scalability are essential.

References

  1. https://arxiv.org/abs/1705.08830
  2. https://inspirehep.net/literature/2633578
  3. https://cds.cern.ch/record/2293646?ln=en
  4. https://iopscience.iop.org/article/10.1088/1748-0221/3/08/S08004
  5. https://gitlab.cern.ch/kalos/CLUEstering
  6. https://github.com/key4hep
  7. https://arxiv.org/abs/2204.10029
  8. https://indico.cern.ch/event/1360895/timetable/#79-final-scrum
  9. https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9754.html
  10. https://www.nvidia.com/en-us/data-center/tesla-t4/