Compiling and running parallel jobs in Grid environment.

In EGEE, it is possible to work with parallel programs using MPI-1 (LAM and MPICH) and MPI-2 (OpenMPI and MPICH2).
The account task in the Grid environment is sent using mpi-start scripts developed within the framework of the Interactive European Grid project by the MPI working group.

These scripts are available on the website :

  https://twiki.cern.ch/twiki/bin/view/EGEE/MpiTools .

Grid calculations for any task require a script file with the extension jdl.
When using the mpi-start system, the user also needs to create a start and end script.
Thus, to calculate a parallel task in Grid, the following is required:

1. A valid user certificate signed by one of the certifying centers, recognized by thegrid-sites (usercert.pem) and the private key (userkey.pem) corresponding
to this certificate;

2. A  JDL file for running a task in the Grid, specifying software requirements in
this file, if the computational element is not explicitly specified when
the task is started;

3. If the task starts with an explicit indication of the computational element,
it is first necessary to request the Grid information system to make sure
that it (the element) has everything it needs for the software job
(MPICH, OpenMPI, MPI-START);

4. mpi-start-wrapper.sh – the file that defines the environment variables;

5. mpi-hooks.sh – a file that prepares the task for startup, compiles it and displays a message when it finishes.

Let’s start with the second paragraph.
Usually existing software is published in the information services of the grid sites
themselves.

This can be MPICH or OPENMPI. These values should be specified in the JDL file
in the Arguments attribute and in the Requirements attribute,
if the computational element is not explicitly specified when the task is started.

Here is the version of this file:

# myprog.jdl
#
Type = «Job»

# JobType =
# MpiCh | Normal | Partitionable | Parametric | Checkpointable
JobType = «Normal»;

# Number of processors:
CPUNumber = 4;

# script file :
Executable = «mpi-start-wrapper.sh»;Executable = «mpi-start-wrapper.sh»;

# Arguments – executable file and environment (software):
Arguments = «myprog MPICH»;

# A file for outputting data to standard output:
StdOutput = «myprog.out»;

# Error file
StdError = «myprog.err»;

# The files sent to the node for execution:
InputSandbox = {«mpi-start-wrapper.sh»,»mpi-hooks.sh»,»myprog.c»};

# Resulting files:
OutputSandbox = {«myprog.err»,»myprog.out»};
Requirements=Member(«MPISTART»,                                                                                         other.GlueHostApplicationSoftwareRunTimeEnvironment)&&
Member(«OPENMPI1.3.2»,  other.GlueHostApplicationSoftwareRunTimeEnvironment);
#  the end

If the environment is implemented in OpenMPI,
then in the value of the Arguments attribute,
you must replace MPICH with OPENMPI.

Command:
glite-wms-job-list-match -a myprog.jdl

Displays a list of available computational elements for this task in accordance
with the information in the task description file and the user rights that
follow from its certificate.

For example:
Connecting to the service
https://lcg16.sinp.msu.ru:7443/glite_wms_wmproxy_server
===========================================================

COMPUTING ELEMENT IDs LIST.

The following CE(s) matching your job requirements have been found:
*CEId*
— cms-eth0-1.kipt.kharkov.ua:2119/jobmanager-lcgpbs-rgstest
— grid129.sinp.msu.ru:2119/jobmanager-lcgpbs-rgstest
— lcgce01.jinr.ru:2119/jobmanager-lcgpbs-rgstest
— lcgce02.jinr.ru:2119/jobmanager-lcgpbs-rgstest
— lcgce21.jinr.ru:8443/cream-pbs-rgstest
— grid-ce.icp.ac.ru:2119/jobmanager-lcgpbs-rgstest
— grid-ce.icp.ac.ru:2119/jobmanager-lcgpbs-test1
============================================================

The file defines the environment variables looks like this:

#!/bin/bash
# mpi-start-wrapper.sh
# Assign variables to the name of the executable file (myprog) and the software #(MPICH)                                                                                                                                                  # Of Arguments JDL-file:

MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2

#Translate uppercase letters to lowercase letters in MPI_FLAVOR for sending to #mpi-start:
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr ‘[:upper:]’ ‘[:lower:]’`

# Let’s specify the correct path to the software based on this parameter:
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`

# Let’s assign a variable to the grid interface:
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Creating an executable file with a formal name (localization):
touch $MY_EXECUTABLE

# Forming variables for mpi-start:
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh

# To include debug information in the execution log
# You must uncomment the 3 lines listed below:
#export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1
#export I2G_MPI_START_TRACE=1
echo «Start: $I2G_MPI_START»

# Run mpi-start:
$I2G_MPI_START

The last helper file consists of two parts.                                                                              The first part, according to the argument OPMENMPI or MPICH,
Compiles a parallel program,
And the second – displays messages about the completion of work:

#!/bin/sh
# mpi-hook.sh
#
# The following function is called before the executable MPI file starts:
#
pre_run_hook () {

# Here are some compilation options Compilation options:
#echo «Compiling ${I2G_MPI_APPLICATION}»
#echo «OPTS=${MPI_MPICC_OPTS}»
#echo «PROG=${I2G_MPI_APPLICATION}.c»
# Compiling the program and getting the executable:
cmd=» mpicc -o myprog myprog.c »
echo $cmd
$cmd
if [ ! $? -eq 0 ]; then
echo «Error compiling program. Exiting…»
exit 1
fi
# If the compilation was successful:
echo «Successfully compiled ${I2G_MPI_APPLICATION}»
return 0
}
# The following function is called before the executable MPI file is terminated:
#
post_run_hook () {
echo «Executing post hook.»
echo «Finished the post hook.»
return 0
}

After the formation of these two auxiliary and one JDL files, the task can be sent
for performance with the command:                                                                                  glite-wms-job-submit -a myprog.jdl ,
If the requirements for the environment were specified in the task definition file;

Either by:
glite-wms-job-submit -a -r myprog.jdl ,
If a specific computing element is specified
(the computational element of a particular grid node)

Example:
(symbol «\» Means that the next 2 lines constitute one command)

lxpub04:~ > glite-wms-job-submit -a -r \
grid-ce.icp.ac.ru:2119/jobmanager-lcgpbs-rgstest myprog.jdl

Connecting to the service
https://lcg16.sinp.msu.ru:7443/glite_wms_wmproxy_server
===============glite-wms-job-submit Success =====================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA
===========================================================

The job is assigned an identifier:
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA ,
by which you can check the status of the task with the command:

lxpub04:~ > glite-wms-job-status \
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job :
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: grid-ce.icp.ac.ru:2119/jobmanager-lcgpbs-rgstest
Submitted: Thu Jan 21 15:49:06 2010 MSK
*************************************************************

As you can see from this table, four parameters are displayed:

Current Status : takes values:
а) Scheduled — the job is accepted;
б) Running — the job is run;
в) Done — calculations are completed.

Status Reason:  the action that takes place at the current status takes values:
а) unavailable — uncertainty, perhaps the task will not be accepted;
б) Job successfully submitted to Globus — the task was successfully sent to the node;
в) Job terminated successfully — the calculations are completed successfully.

Destination: shows the grid node where the task is sent.

Submitted: start sending…

After successful completion of the job, you can take the results of
work with the command:

lxpub04:~ > glite-wms-job-output —dir myprogout \
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA
Connecting to the service
https://lcg16.sinp.msu.ru:7443/glite_wms_wmproxy_server
============================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://lcg16.sinp.msu.ru:9000/OJhpNW-Z4xNvd7gIIQCCKA
have been successfully retrieved and stored in the directory:
/afs/jinr.ru/user/d/dushanov/myprogout
================================================================
In this case, the obtained data (myprog.out, myprog.err)
is copied to the myprogout directory.

E.B.Dushanov  e-mail:  dushanov@jinr.ru,  T.F.Sapozhnikova  e-mail tsap@jinr.ru