The ATLAS Collaboration has announced the release of 7 billion proton–proton collision events, alongside 2 billion simulated events, in the collaboration’s first release of open data for scientific research. The open data, available through CERN’s Open Data portal, comprise the entire 2015 and 2016 datasets (36 fb–1 integrated luminosity) and have been released under the Creative Commons CC0 waiver. This is ATLAS’ first comprehensive data release specifically designed for scientific research. The 65 TB of data are in the new ATLAS PHYSLITE format, a light-weight format designed for adoption in Run 3 and extensive use in Run 4, and with which analyses have already been published.
This release of data follows the CERN Open Data policy, endorsed by all four LHC experimental collaborations in 2020. This is only one part of the ATLAS Collaboration’s efforts in Open Science. ATLAS has already released significant Open Data for Education and Outreach purposes, as well as bespoke simulated data sets for specific uses like top-quark tagging or jet reconstruction training.The ATLAS software has been public for several years, and from early on, the collaboration has advocated for Open Access publications, including posting all papers to arXiv and supplementary data, like digitized distributions and statistical likelihood functions to HepData.
To maximize the usefulness of the data for the scientific community, ample documentation was prepared and released on the ATLAS Open Data portal, including descriptions of the datasets and their metadata, software for analysis, and the estimation of systematic uncertainties. Importantly, the data and corresponding software are exactly those used within the collaboration for physics analysis. This ensures their authenticity, but results in complexity that can only be overcome with excellent documentation and examples.
Because of the extensive existing tutorials based around the ATLAS Open Data for Education and Outreach, an effort has been made to construct a learning spectrum, offering paths for everyone from novices and learners to researchers. An introductory guide is available to get newcomers started. The goal of these paths is to connect the Open Data for research to the existing resources and provide a clear line — either for those already playing with the Open Data for Education and Outreach to delve into more complex studies, or for those starting from the Open Data for research to connect to existing tutorials on statistics, machine learning, and more.
Begin your journey with ATLAS open data by following the tutorial below.
The ATLAS Collaboration has benefited from a robust framework for collaboration, the Short-Term Association program, through which interested researchers can directly participate in ongoing analyses within the collaboration. Although the Open Data for research allows users to perform an analysis at an appropriate level of detail and scientific rigour, those who wish to understand the data deeply, work with the entire collected dataset (rather than the fraction released as Open Data), and benefit from the collaboration’s vast detector and analysis expertise,expertise are welcome to take part as associates.
This is only the first step of the collaboration into the world of Open Data for research. The next goal is the release of Open Data for research based on the Heavy Ion (lead–lead) collision datasets from Run 2.
Feedback on and questions about the ATLAS Open Data can be sent directly or posted to the CERN Open Data Forum.