Parallel computing at the farm LIT.

1. Run parallel jobs  in BATCH

To perform parallel calculations on a farm LIT created a farm of 20 28-nuclear machines, connected in addition 2x Infiniband. Users can perform 560 cores to perform parallel calculations. Tasks are run in the queue “ib” whose parameters are always fashionable by using the command:
qstat -Qf ib :

Queue: ib
resources_max.cput = 50000:00:00
resources_max.nodect = 560
resources_max.walltime = 101:00:00
resources_min.nodect = 2
resources_available.nodect = 560
resources_default.cput = 50000:00:00
resources_default.walltime = 101:00:00

cput — maximum amount of CPU time used by all processes in the job.
Units: time;

max.nodect – the maximum number of CPUs for the parallel task (for the ib queue);
max.walltime – maximum astronomical task count time;

cput is the maximum amount of processor time used by all processes in the job.

astronomical time (walltime) * number of processes (nodect) –
– CPU time (cput)

Example 1. Launch the job through pbs_script file
qsub pbs_script, were pbs_script is :
#!/bin/sh
#PBS -q ib
#PBS -l walltime=10:00:00,nodes=8:para
#PBS -m abe
#PBS -M username@lxpub01
#PBS -r n
mpiexec $PBS_O_WORKDIR/program_name

Example 2. Running jobs from the command line with the required parameters:

qsub -q ib -l walltime = 10: 00: 00, nodes = 8: ib -m abe -M username@lxpub01 -r n mpiexec $ PBS_O_WORKDIR /program_name

where  options are:
-q                   is the name of the queue (for parallel computations this is “ib”)
-l                    set of technical parameters via “,”
-walltime      maximum execution time
-nodes           number of processors (at the end after “:” the name of the queue)

-m                  events, which should be notified by email:
b – the beginning,
e – completion,
a – stopping work by mistake
-M                  e-mail address to which all service messages about the status of the task are sent
-r                   (y / n) whether to restore the task, when the nodes are rebooted

Example 3.  Running parallel jobs located in AFS

Example 4.  Running parallel jobs with files outside of AFS

Example 5. Test-job. When a user has something in the batch system that does not work