CERN Accelerating science

Google Summer of Code brings Students to HEP Software Projects

Google Summer of Code is a programme that brings together students wanting to code for Open Source software projects and mentors who can introduce them to exciting projects and the full software development cycle. As a major supporter of Open Source software the EP-SFT group has been involved for a decade, with students contributing to ROOT, Geant4, CVMFS and other projects since 2011.

In 2017 there was a major shift to recast CERN’s involvement, joining with the HEP Software Foundation to allow a wide range of projects in high-energy physics to participate under the umbrella of the CERN-HSF. This lowers the overhead for projects to participate and the number of projects has grown a lot over time (Figure 1). 

Figure 1 - CERN-HSF GSoC projects have delivered over 130 successful projects, sustaining involvement despite the impact of COVID.

This year, despite the impact of COVID-19, CERN-HSF started 27 projects and there was a 93% success rate. Many big high-energy physics organisations again took part this year, with projects from ATLAS, CMS and LHCb as well as from the ROOT and TMVA, Rucio, Rivet, CVMFS, zfit and more.

Amongst the highlight themes this year was the development of automatic differentiation, with no fewer than three projects that helped expand the capabilities of Clad, which can auto differentiate C++ code using the Clang compiler backend. New features developed in GSoC allow differentiation of lambdas and closures, improved numerical differentiation as well as obtaining second derivatives of ROOT TFormula objects. Auto differentiation is at the heart of modern machine learning and this year we saw a crop of projects to improve the storage capabilities of TMVA for machine learning models and to output highly optimised inference code that can be easily integrated into other projects. GPUs continue to be used more and more widely in HEP and this year’s GSoC students worked on projects to evaluate the performance of the Alpaka abstraction layer for different GPUs and to partition GPUs with the OpenForBC toolkit. There was also a project this year to improve the graphical outputs of the Rivet toolkit used to validate MC generators (Figure 2).

Figure 2 - New surface plots from Rivet, showing data and 2 MC models, with a ratio plot below (Simon Thor).

GSoC students also worked on important pieces of our distributed computing infrastructure, developing a plugin that allows out Rucio data management system to a ScienceMesh storage site and traditional grid resources. Moving binary code to compute nodes has been enhanced by the addition of the capability to download bundles of software in one go via the CernVM filesystem, helping complex jobs to start quickly.

GSoC students have written final reports on their projects and some have been keeping blogs throughout the process. The programme has had, over time, a very positive impact on the HEP community - the commitment of mentors is very often rewarded with excellent results, bonding with talented young developers from all over the world who very often continue to contribute to the project after GSoC.

If you want to know more, or have ideas for projects in next year’s GSoC, contact the programme coordinators, Andrei, Javier and Antoine.