SAM

SAM – Site Available Monitoring.   http://wlcg-sam-cms.cern.ch/

The monitoring system of the sites readiness consists of the following  three major subsystems:

  • monitoring of the sites availability (Site Available Monitoring – SAM)
    includes tests which check-up the basic services of the site,
  • monitoring of the test data processing flows,
  • monitoring of the data transmission in order to compute
    the data transmission capability between sites.

The readiness of the site is evaluated on the base of indicators that depend on the results of these tests.

Within WLCG, all the grid services are periodically tested using the SAM tests which start the monitoring tasks with high priority every hour on the sites.

These tests allow one to measure availability and reliability of the sites.
The results are analyzed and visualized in the monitoring system
CMS Dashboard [Andreeva, Cinquilli, Dieguez, 2012; CMS Dashboard].

The SAM tests can detect problems of the site, to rate and
rank the site for its availability.
An error in executing any of these tests means the unavailability of the
service instance used to run the test.

The ability to process data is tested by running monitoring tasks on the sites similar to the tasks of real data processing.
A special automated system has been designed to trigger and control the process of performing simulated tasks for analysis of data which provide a constant load of the site processors.
The system is used to test the ability of the site to perform certain tasks
for the CMS in the required quantity.

The regular submitting of jobs to all the CMS sites allows one
to measure the daily success rate and to get the site efficiency rating.

To use the sites, one needs to have the data transfer annotations working. The procedure of certifying is to demonstrate a site’s ability to hold a certain average throughput during 24 hours
20 MB/s Tier-0 → Tier-1, Tier-1 ↔ Tier-1 and Tier-1 ↔ Tier-2 links, and
5 MB/s for Tier-2 ↔ Tier-1 and Tier-2 ↔ Tier-2 links.

Debugging the data transfer is performed through the artificial initiation of data transfers between the sites, which are used to evaluate the quality of communication lines between these sites.
The quality of data transfer is constantly checked-up at low load (few kB/s) for all.
This allows one to quickly find problems in the data transfer not only at the network level, but also at the level of data transmission services and storage infrastructure.