2-years HPC DevOps Engineer position at CEA
HPC DevOps Engineer at CEA 👩💻🧑💻
Deployment and CI on supercomputers for the C++ Kokkos library within the “Moonshot” CExA project
CEA is recruiting DevOps engineers for a 2-year period to join the CExA “Moonshot” project team, which is setting up CEA’s GPU computing software stack around the Kokkos C++ library, to contribute to innovative packaging, deployment and continuous integration approaches for supercomputers, based in particular on Spack. A team of more than 10 people is currently being set up. The positions will be based at the CEA Saclay site near Paris.
To apply, please send your application (CV and covering letter) to contact@cexa-project.org. If you have any questions about the position, please use the same address. Applications will be assessed from mid-November until the position is filled.
Context
Europe is preparing for the arrival of the first exascale supercomputers, including one in France, at the CEA, from 2025. These machines will be heterogeneous, accelerated by GPUs of various vendors and architectures. Ensuring performance and portability under these conditions is undoubtedly one of the greatest challenges of Exascale. To address this, the CEA is investing heavily in an ambitious ‘Moonshot’ project: CExA. In this project, we will be providing the libraries needed to fully exploit this computing power in CEA’s scientific applications by contributing to, extending and adapting the Kokkos open-source library. The software stack created in this way will be deployed on the supercomputers using the Spack tool, which has been specially designed for supercomputing environments. Within CExA, we represent teams with expertise in numerical computation from the CEA’s four divisions.
- Maison de la Simulation (https://www.mdls.fr) of Fundamental Research division the is a joint research and engineering laboratory of the CEA, CNRS, Paris-Saclay University and Versailles Saint Quentin University specializing in high-performance computing and numerical simulation.
- The software engineering department for simulation of the Energy Research Division groups together three laboratories that address the issues of simulation environments, AI and data science, intensive computing and numerical analysis.
- LIST’s DSCIN at Technological Research Department is responsible for the research and development of digital integrated circuits and processors for AI, as well as the design of complex digital architectures. It also works on solutions for embedded systems and develops design tools for embedded AI, embedded systems and trusted circuits.
- The DSSI at Military Application Department manages and carries out activities in the fields of computing, applied mathematics and information systems, covering a broad spectrum from definition and design to user services.
Mission
As part of a new agile team being set up to carry out the CExA project, you will be working in collaboration with the French (in particular NumPEx) and European HPC ecosystem and with the teams in charge of developing Kokkos and Spack in the United States to adapt the tools to the needs of the applications developed by the CEA and to the technologies developed by Europe for Exascale (EPI, SiPearl, RISC-V). Your mission will include:
- Supporting agile development in C++ around Kokkos by contributing to the following points:
- Implementing a testing and performance measurement strategy.
- Designing, automating and administering continuous integration pipelines.
- Working with development teams to optimize packaging and deployment processes.
- Assist with deployment on heterogeneous architectures for European exaflop supercomputers.
- Identify and participate in the development of missing functionalities within the tools used for packaging and deployment and continuous integration.
- Helping to deploy Kokkos in the software environments of selected application demonstrators (hydrodynamics, fusion energy, etc.).
- Provide support and leadership on these themes within the organization and on the scale of European and global collaborations.
Skills
You have a Master’s degree and/or an engineering degree in computer science and:
- You will be able to work within an agile development process (SCRUM) and be familiar with the basic tools associated with collaborative development (Git, GitHub, etc.).
- You have software engineering skills. You are familiar with common development environments and associated tools (CMake, Docker, Spack, GoogleTest, CTest, etc.).
- Scripting skills (Python, Shell, etc.)
- Any knowledge of parallel programming (GPU, multi-threading, etc.) is a plus, particularly with the Kokkos library or equivalent.
- You have knowledge of the C++ ecosystem.
- You are a self-starter and are keen to join an international team. You have a good command of technical English (written and spoken). You are interested in the world of high-performance computing and its challenges and keep updated with the latest technological evolution.
Salary and benefits
The CEA offers competitive salaries depending on your qualifications and experience. There are several advantages to this position:
- the possibility of joining existing collaborations with other laboratories in Europe, the United States and Japan,
- numerous opportunities for international travel (exchanges, conferences, workshops and more),
- up to 3 days’ teleworking per week,
- 75% reimbursement on public transport and a free transport network throughout the Ile-de-France region,
- an attractive supplementary pension scheme and several company savings plans,
- 5 weeks’ paid holiday and 4 weeks’ RTT per year.