Optimizing new components of PanDA for ATLAS production on HPC resource

The Production and Distributed Analysis system (PanDA) has been used
for workload management in the ATLAS Experiment for over a decade.
It uses pilots to retrieve jobs from the PanDA server and execute them
on worker nodes.
While PanDA has been mostly used on Worldwide LHC Computing Grid (WLCG)
resources for production operations, R&D work has been ongoing on cloud and HPC resources for many years.

These efforts have led to the significant usage of large scale  HPC resources in the past couple of years.

The LIT staff made changes to the pilot which enabled the use of HPC sites by PanDA, specifically the Titan supercomputer at Oakridge National Laboratory.

Furthermore, it was decided in 2016 to start a fresh redesign of the Pilot with a more modern approach to better serve present and future needs from ATLAS and other collaborations that are interested in using the PanDA System.

Another new project for development of a resource oriented service,
PanDA Harvester, was also launched in 2016.

The main goal of the Harvester is flexible distribution of payloads for  opportunistic resources like HPC and clouds.
Both applications are now in full development after a year of studying
use cases, trying different designs and deciding on the shared components model.

 

Danila OLEYNIK