CERN Accelerating science

SFT participates for fifth year in Google's Summer of Code programme

by Lorenzo Moneta & Sandro Christian Wenzel

The CERN PH/SFT group has participated for the fifth consecutive year in the 2015 Google Summer of Code programme. 14 students have been funded by Google to work for 3 months during the summer on CERN open source software projects. The projects involved are related to ROOT, the Geant simulation frameworks and CERNVM, which are developed by the PH-SFT group, and SixTrack, a numerical accelerator simulation software, developed in the Beams department.

On the 1st of September the students presented their projects to their mentors and colleagues. The students connected through Vidyo to present their results and discuss the challenges that they faced along with some novel ideas for further extensions and developments. The developed software has been integrated into the respective project code repositories or made publicly available for further usage. Lorenzo Moneta and Sandro Wenzel, the local GSoC organizers from PH-SFT, started in February to collect ideas for potential projects from different group members who mentored the students throughout the period of GSoC.

Four of this year projects were related to ROOT.  The first project was carried out by Omar Zapata, a Columbian student, working on integrating multi-variate statistical tools from the R software packages into ROOT. Using the ROOT-R interface code developed in the last year’s project, Omar has developed interfaces into the ROOT multi-variate library, TMVA, algorithms from R such as the decision trees from the C5O package, a neural network as well as supported vector machine algorithms. In addition, he has extended the project by improving directly the TMVA code and connected also to algorithms from the Python Scikit-Lear package. These very interesting extensions of ROOT MVA tools have been presented by him at the past ROOT workshop, which took place in Saas-Fee in mid September and they will also be shown to a wider community at the incoming Data Science workshop.

Another ROOT project carried out by Maciej Zminoch and mentored by Lorenzo Moneta consisted in developing a new version of the TTreeFormula class, which parses the string expressions used when querying and making variable selections on a ROOT tree. The project used the capability of Just-In-Time compilation of C++ code of ROOT Cling for compiling the passed expression. One of the outcomes of the project is the possibility of having the user passing advanced C++-11 code (e.g. using the lambda function) in the formula expression.

Anna Smagina worked with mentor Philippe Canal on extending the ROOT I/O customization rules used for schema evolution. This mechanism makes it possible to read ROOT data files written with old data object formats. More complex rules for migrating data structures were developed.

Ernesto Cedeno, mentored by Olivier Couet, worked on developing a prototype interface of ROOT for Paraview, a data-analysis and visualization software tool. The developed code allows to read a ROOT data file in Paraview in order to display its content. 

Toby St. Clere-Smithe worked on interfacing Python and  C++, mentored by Wim Lavrijsen . Toby is a Master student in Complex Systems and the author of PyViennaCL, the Python bindings to ViennaCL, a template C++ library for GPGPU linear algebra. Toby developed a "Pythonization API" to allow developers to add mapping patterns from idiomatic C++ expressions to equivalent Python ones.These mappings include exceptions, array/size, ownership policy, smart pointers, method renaming, and getter/setters to properties. Toby provided implementations for both CPython (PyROOT) and PyPy (cppyy), as well as test cases for both systems. Both codes have been accepted in the respective repositories of the ROOT and PyPy projects.

Liu Zhengyang, with the mentoring of Danilo Piparo, worked on a refactoring and improvements of the Static Analysis Suite (SAS), which is a tool to ensure correctness of code. This is not a trivial task, especially when dealing with the complexity of the software stacks of the LHC experiments. SAS is based on an open source compiler, Clang, and allows the user to conveniently write checks that the programs code is put through. Liu achieved a colossal refactoring and improvement of SAS: the programming interface was renovated, fine-grained configurability made possible and the extension of the checkers family made straight-forward. SAS is now the tool that allows also non-experts of compiler technologies to deploy static analysis for their projects.

Two projects, carried out by students Yigit Demirag and Jan Stephan, targeted open source projects related to explicit SIMD vectorization libraries for C++ and their application in some of our numerical codes. Jan worked with mentor Matthias Kretz on adding GPU support to the “Vc” vectorization library (https://github.com/VcDevel/Vc) with the aim to achieve portability of Vc code between the similar CPU-SIMD and GPU-SIMT code execution models. By tweaking the intricate internals of Vc, the project demonstrated in a prototype that this is indeed possible on the NVIDIA CUDA platform. This encouraging result already triggered further interest on this topic, which will be followed up in more detail (possibly with further contributions from Jan).

Yigit, on the other hand, demonstrated in his project mentored by Sandro Wenzel that these vectorization libraries are great for explicitly speeding up algorithms such as state-of-the art random number generators. He achieved this for the Threefry generator, which is part of a new class of “counter based generators”. Such generators have interesting properties for massively parallel algorithms at CERN. Yigit also used his deep understanding in low-level programming to provide a first support for explicit vectorization on the ARM-NEON architecture, which we aim to provide to the Vc library. Once ready, this will greatly enhance our code portability to more interesting platforms.

One project, called “binary code” browser, mentored by Sandro Wenzel, was carried out in our “blue sky” ideas section targeting open source tool development for the benefit of (low level) software developers across different projects. Student Alin Mindroc used his excellent understanding of web technologies to provide a JavaScript-based (as well as an Eclipse-plugin) tool allowing to interactively browse the content of binary files and to inspect or to visually compare the assembly code of functions. With this, we are now essentially able to visualize the effect of different compiler options on our code, which can greatly enhance optimization or testing efforts.  

Rafael Mkrtchyan worked with mentor Jakob Blomer on improving the performance of data transfer in the CernVM-FS file system, used extensively by many CERN collaborations for their software distribution. Rafael managed greatly to add support of the new HTTP/2 standard to the CernVM-FS infrastructure, achieving in result multiplexed and parallel data transfer modes. Integration of this code into the mainline repository is in an advanced stage and this new technology should be available to clients very soon.

Two students, Somnath Banerjee (India) and Jason Suagee (US), worked on new methods for integrating trajectories of particles in an electromagnetic field for use in Geant4 particle transport simulations.  Somnath implemented general, higher-order Runge-Kutta methods based on so-called RK tableaus as proposed in the literature.  He showed that they are computationally more efficient than existing general RK methods in Geant4.   Jason implemented innovative Runge-Kutta-Nystrom methods for the general second-order ordinary differential equations. Both developed methods have the new ability to estimate an intermediate point after a successful step, which will be used to identify the location where trajectories cross boundaries more efficiently.

Ramya Bhaskar and Anshu Aviral contributed to the build up of the SixTrackLib: a portable library that simulate the trajectory of single charge particles for millions of turns inside accelerators like the LHC without introducing significant numerical artifacts. Anshu took over

from the first proof-of-principle code produced by Kartikeya in GSoc14 and moved to a solid framework and uniform code base. At the same time he updated the main tracking routine using newly developed formulations. The code underwent to a full scale benchmarking highlighting close match with the original Fortran code. The small differences were pinned down to floating point inaccuracy between the two codes. Ramya extended the code with a specific tracking model, not yet present in the legacy code. We expect to continue developing the library to complete the list of supported tracking models and perform extensive benchmarks. For more details see cern.ch/sixtrack and github.com/SixTrack/SixTrackLib.

After the remarkable success of these projects, the organizers are eagerly looking forward to next year’s program.  Everybody involved in a CERN open source software projects and wishing to mentor a student, is welcome to contact the organizers for preparing the project ideas for next year.

 
 
 
 
 
 
 
 
 
 
 
 
 
 

Latest issues in pdf