The ATLAS Production System Predictive Analytics service: an approach for intelligent task analysis.

The ATLAS Production System Predictive Analytics service: an approach for intelligent task analysis.

The second generation of the Production System (ProdSys2) of the ATLAS experiment (LHC, CERN), in conjunction with the workload management system – PanDA (Production and Distributed Analysis), represents a complex set of computing components that are responsible for defining, organizing, scheduling, starting and executing payloads in a distributed computing infrastructure. ProdSys2/PanDA are responsible for all stages of (re)processing, analysis and modeling of raw and derived data, as well as simulation of physical processes and functioning of the detector using Monte Carlo methods. The prototype of the ProdSys2 Predictive Analytics (P2PA) is an essential part of the growing analytical service for the ProdSys2 and will play the key role in the ATLAS computing. P2PA uses such tools as Time-To-Complete (TTC) estimation towards units of the processing (i.e., tasks, chains and groups of tasks) to control the processing state and rate, and to be able to highlight abnormal operations and executions (e.g., to discover stalled processes). It uses methods and techniques of machine learning to obtain corresponding predictive models and metrics that are aimed to characterize the current system’s state and its changes over the close period of time.