सत्येंद्र नाथ बसु राष्ट्रीय मौलिक विज्ञान केंद्र

सत्येंद्र नाथ बसु राष्ट्रीय मौलिक विज्ञान केंद्र

S. N. Bose National Centre for Basic Sciences

Facilities

High Performance Computing

1. Overview

HPC is a Computation Cluster for Parallel Computing. User can submit parallel jobs in this cluster. There are 31 nodes having 8 cores and 16 GB of RAM and additional 8 nodes having 12 cores and 24 GB of RAM.

2. Configurations

2.1 Hardware:
  • 8 Core Nodes CPU = 2 x Intel ® Xeon ® E5410 @ 2.33 GHz - 4 Core CPU; RAM = 16 GB
  • 12 Core Nodes CPU = 2 x Intel ® Xeon ® E5650 @ 2.67 GHz - 6 Core CPU; RAM = 24 GB
  • SAN = 2.2 TB
2.2 Software:

Intel ® C and Fortran Compilers (V11.0) and OpenMPI are available on this cluster.

3. How to get an account?

Students may contact their respective Faculty Supervisor for access. Accounts are available only to members of Faculty which may be used by students.

4. Usage Guidelines

4.1Queues

When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The queues are different from one another in terms of Number of Nodes & Processors and Walltime. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need. All the queues mentioned here are parallel queues using which users can submit parallel jobs. The queue names (given in bold) are used while submitting jobs in job submission command.

There are 7 different parallel queues available in the HPC cluster. They are as follows:

  • parallel64: This queue can be used for submitting parallel jobs of 64 processors. Walltime is 168 Hrs.
  • parallel32: This queue can be used for submitting parallel jobs of 32 processors. Walltime is 96 Hrs.
  • parallel24: This queue can be used for submitting parallel jobs of 24 processors. Walltime is 72 Hrs.
  • parallel16-A: This queue can be used for submitting parallel jobs of 16 processors. Walltime is 48 Hrs.
  • parallel16-B: This queue can be used for submitting parallel jobs of 16 processors. Walltime is 24 Hrs.
  • parallel12-A: This queue can be used for submitting parallel jobs of 12 processors. Walltime is 48 Hrs.
  • parallel12-B: This queue can be used for submitting parallel jobs of 12 processors. Walltime is 24 Hrs.
For submitting a job
$ msub -N [name of the job] -l nodes=[no. of nodes]:ppn=8 -o [outputfile] -e [error log] -d [execution directory] -q <queue name> submit_script.sh

Note: On successful submission of any job, a job id is given by the scheduler. Always keep a note of this job id. You will need this for monitoring, canceling and troubleshooting purpose.

Sample submit script
# !/bin/bash
# cd /home/
SCRATCH_JOB=/scratch/$USER/working-path
pwd=/home/<hpc username>/<your-working-path (meaning the complete output of pwd)>
echo "Job started at " `date`
mast=`hostname`
echo $mast
for i in `cat $PBS_NODEFILE|uniq`
do
      ssh $i rm -rf $SCRATCH_JOB
      ssh $i mkdir -p $SCRATCH_JOB
      ssh $i cp -r $pwd/* $SCRATCH_JOB
      chmod 700 $SCRATCH_JOB
done
cd $SCRATCH_JOB
rm -f temp.1 temp.2
NP=`cat $PBS_NODEFILE|wc -l`
# date > job_runfile
# echo "dummy job executed " >> job_runfile
# pwd >> job_runfile
# ls -l >> job_runfile
# echo " i am done leaving job " >> job_runfile
mpiexec -machinefile $PBS_NODEFILE -np $NP /opt/vasp/vasp-openmpi-4.6
echo "$NP"
cp -Rf * $pwd
for i in `cat $PBS_NODEFILE|uniq`
do
      ssh $i rm -vrf $SCRATCH_JOB
done
echo "Job ended at " `date`
4.2Useful Commands
  • Queue Status: If user want to see the queue status, the command for that is $ mshow (it will show all the running job)
  • For checking the job status:# checkjob [job id]
  • For canceling the job:# canceljob [job id]