Software installed on Palmetto

Overview

Modules

A large number of popular software packages are installed on Palmetto and can be used without any setup or configuration. These include:

  • Compilers (such as gcc, Intel, and PGI)
  • Libraries (such as OpenMPI, HDF5, Boost)
  • Programming languages (such as Python, MATLAB, R)
  • Scientific applications (such as LAMMPS, Paraview, ANSYS)
  • Others (e.g., Git, PostgreSQL, Singularity)

These packages are available as modules on Palmetto. The following commands can be used to inspect, activate and deactivate modules:

Command Purpose
module avail List all packages available (on current system)
module add package/version Add a package to your current shell environment
module list List packages you have loaded
module rm package/version Remove a currently loaded package
module purge Remove all currently loaded packages

See the Quick Start Guide for more details about modules.

Licensed software

Many site-licensed software packages are available on Palmetto cluster (e.g., MATLAB, ANSYS, COMSOL, etc.,). There are limitations on the number of jobs that can run using these packages. See this section of the User's Guide on how to check license usage.

Individual-owned or group-owned licensed software can also be run on Palmetto.

Software with graphical applications

See this section of the User's Guide on how to use software with graphical user interface (GUI).

Installing your own software

See this section of the User's Guide on how to check license usage.

ABAQUS

ABAQUS is a Finite Element Analysis software used for engineering simulations. Currently, ABAQUS versions 6.10, 6.13, 6.14 are available on Palmetto cluster as modules.

$ module avail abaqus

abaqus/6.10 abaqus/6.13 abaqus/6.14

To see license usage of ABAQUS-related packages, you can use the lmstat command:

/software/USR_LOCAL/flexlm/lmstat -a -c /software/USR_LOCAL/flexlm/licenses/abaqus.dat

Running ABAQUS interactive viewer

To run the interactive viewer, you must log-in with tunneling enabled, and then ask for an interactive session:

$ qsub -I -X -l select=1:ncpus=8:mpiprocs=8:mem=6gb:interconnect=1g,walltime=00:15:00

Once logged-in to an interactive compute node, to launch the interactive viewer, load the abaqus module, and run the abaqus executable with the viewer and -mesa options:

$ module add abaqus/6.14
$ abaqus viewer -mesa

Similarly, to launch the ABAQUS CAE graphical interface:

$ abaqus cae -mesa

Running ABAQUS in batch mode

To run ABAQUS in batch mode on Palmetto cluster, you can use the job script in the following example as a template. This example shows how to run ABAQUS in parallel using MPI. This demonstration runs the "Axisymmetric analysis of bolted pipe flange connections" example provided in the ABAQUS documentation here. Please see the documentation for the physics and simulation details. You can obtain the files required to run this example using the following commands:

$ cd /scratch2/username
$ module add examples
$ example get ABAQUS
$ cd ABAQUS && ls

abaqus_v6.env  boltpipeflange_axi_element.inp  boltpipeflange_axi_node.inp  boltpipeflange_axi_solidgask.inp  job.sh

The .inp files describe the model and simulation to be performed - see the documentation for details. The batch script job.sh submits the job to the cluster. The .env file is a configuration file that must be included in all ABAQUS job submission folders on Palmetto.

#!/bin/bash
#PBS -N AbaqusDemo
#PBS -l select=2:ncpus=8:mpiprocs=8:mem=6gb:interconnect=1g,walltime=00:15:00
#PBS -j oe

module purge
module add abaqus/6.14

pbsdsh sleep 20

NCORES=`wc -l $PBS_NODEFILE | gawk '{print $1}'`
cd $PBS_O_WORKDIR

SCRATCH=$TMPDIR

# copy all input files into the scratch directory
for node in `uniq $PBS_NODEFILE`
do
    ssh $node "cp $PBS_O_WORKDIR/*.inp $SCRATCH"
done

cd $SCRATCH

# run the abaqus program, providing the .inp file as input
abaqus job=abdemo double input=$SCRATCH/boltpipeflange_axi_solidgask.inp scratch=$SCRATCH cpus=$NCORES mp_mode=mpi interactive 

# copy results back from scratch directory to $PBS_O_WORKDIR
for node in `uniq $PBS_NODEFILE`
do
    ssh $node "cp -r $SCRATCH/* $PBS_O_WORKDIR"
done

In the batch script job.sh:

  1. The following line extracts the total number of CPU cores available across all the nodes requested by the job:

~~~ NCORES=wc -l $PBS_NODEFILE | gawk '{print $1}' ~~~

  1. The following line runs the ABAQUS program, specifying various options such as the path to the .inp file, the scratch directory to use, etc.,

~~~ abaqus job=abdemo double input=/scratch2/$USER/ABAQUS/boltpipeflange_axi_solidgask.inp scratch=$SCRATCH cpus=$NCORES mp_mode=mpi interactive ~~~

To submit the job:

$ qsub job.sh
9668628

After job completion, you will see the job submission directory (/scratch2/username/ABAQUS) populated with various files:

$ ls

AbaqusDemo.o9668628  abdemo.dat  abdemo.msg  abdemo.res  abdemo.stt                      boltpipeflange_axi_solidgask.inp
abaqus_v6.env        abdemo.fil  abdemo.odb  abdemo.sim  boltpipeflange_axi_element.inp  job.sh
abdemo.com           abdemo.mdl  abdemo.prt  abdemo.sta  boltpipeflange_axi_node.inp

If everything went well, the job output file (AbaqusDemo.o9668628) should look like this:

[atrikut@login001 ABAQUS]$ cat AbaqusDemo.o9668628
Abaqus JOB abdemo
Abaqus 6.14-1
Abaqus License Manager checked out the following licenses:
Abaqus/Standard checked out 16 tokens from Flexnet server licensevm4.clemson.edu.
<567 out of 602 licenses remain available>.
Begin Analysis Input File Processor
Mon 13 Feb 2017 12:35:29 PM EST
Run pre
Mon 13 Feb 2017 12:35:31 PM EST
End Analysis Input File Processor
Begin Abaqus/Standard Analysis
Mon 13 Feb 2017 12:35:31 PM EST
Run standard
Mon 13 Feb 2017 12:35:35 PM EST
End Abaqus/Standard Analysis
Abaqus JOB abdemo COMPLETED


+------------------------------------------+
| PALMETTO CLUSTER PBS RESOURCES REQUESTED |
+------------------------------------------+

mem=12gb,ncpus=16,walltime=00:15:00


+-------------------------------------+
| PALMETTO CLUSTER PBS RESOURCES USED |
+-------------------------------------+

cpupercent=90,cput=00:00:10,mem=636kb,ncpus=16,vmem=12612kb,walltime=00:00:13

The output database (.odb) file contains the results of the simulation which can be viewed using the ABAQUS viewer:

ANSYS

Graphical Interfaces

To run the various ANSYS graphical programs, you must log-in with tunneling enabled and then ask for an interactive session:

$ qsub -I -X -l select=1:ncpus=8:mpiprocs=8:mem=6gb:interconnect=fdr,walltime=01:00:00

Once logged-in to an interactive compute node, you must first load the ANSYS module along with the Intel module:

$ module add ansys/19.5

And then launch the required program:

For ANSYS APDL

$ ansys195 -g

If you are using e.g., ANSYS 20.2 instead, then the executable is called ansys202.

For CFX

$ cfxlaunch

For ANSYS Workbench

$ runwb2

For Fluent

$ fluent

For ANSYS Electromagnetics (only available for ansys/20.2)

$ ansysedt

Batch Mode

To run ANSYS in batch mode on Palmetto cluster, you can use the job script in the following example as a template. This example shows how to run ANSYS in parallel (using multiple cores/nodes). In this demonstration, we model the strain in a 2-D flat plate. You can obtain the files required to run this example using the following commands:

$ cd /scratch1/username
$ module add examples
$ example get ANSYS
$ cd ANSYS && ls

input.txt job.sh

The input.txt batch file is generated for the model using the ANSYS APDL interface. The batch script job.sh submits the batch job to the cluster:

#!/bin/bash
#PBS -N ANSYSdis
#PBS -l select=2:ncpus=4:mpiprocs=4:mem=11gb:interconnect=1g
#PBS -l walltime=1:00:00
#PBS -j oe

module purge
module add ansys/19.5

cd $PBS_O_WORKDIR

machines=$(uniq -c $PBS_NODEFILE | awk '{print $2":"$1}' | tr '\n' :)

for node in `uniq $PBS_NODEFILE`
do
    ssh $node "sleep 5"
    ssh $node "cp input.txt $TMPDIR"
done

cd $TMPDIR
ansys195 -dir $TMPDIR -j EXAMPLE -s read -l en-us -b -i input.txt -o output.txt -dis -machines $machines -usessh

for node in `uniq $PBS_NODEFILE`
do
    ssh $node "cp -r $TMPDIR/* $PBS_O_WORKDIR"
done

In the batch script job.sh:

  1. The following line extracts the nodes (machines) available for this job as well as the number of CPU cores allocated for each node:

~~~ machines=$(uniq -c $PBS_NODEFILE | awk '{print $2":"$1}' | tr '\n' :) ~~~

  1. For ANSYS jobs, you should always use $TMPDIR (/local_scratch) as the working directory. The following lines ensure that $TMPDIR is created on each node:

~~~ do ssh $node "sleep 5" ssh $node "cp input.txt $TMPDIR" done ~~~

  1. The following line runs the ANSYS program, specifying various options such as the path to the input.txt file, the scratch directory to use, etc.,

~~~ ansys195 -dir $TMPDIR -j EXAMPLE -s read -l en-us -b -i input.txt -o output.txt -dis -machines $machines -usessh ~~~

  1. Finally, the following lines copy all the data from $TMPDIR:

~~~ do ssh $node "cp -r $TMPDIR/* $PBS_O_WORKDIR" done ~~~

To submit the job:

$ qsub job.sh
9752784

After job completion, you will see the job submission directory (/scratch2/username/ANSYS) populated with various files:

$ ls

ANSYSdis.o9752784  EXAMPLE0.stat  EXAMPLE2.err   EXAMPLE3.esav  EXAMPLE4.full  EXAMPLE5.out   EXAMPLE6.rst   EXAMPLE.DSP    input.txt
EXAMPLE0.err       EXAMPLE1.err   EXAMPLE2.esav  EXAMPLE3.full  EXAMPLE4.out   EXAMPLE5.rst   EXAMPLE7.err   EXAMPLE.esav   job.sh
EXAMPLE0.esav      EXAMPLE1.esav  EXAMPLE2.full  EXAMPLE3.out   EXAMPLE4.rst   EXAMPLE6.err   EXAMPLE7.esav  EXAMPLE.mntr   mpd.hosts
EXAMPLE0.full      EXAMPLE1.full  EXAMPLE2.out   EXAMPLE3.rst   EXAMPLE5.err   EXAMPLE6.esav  EXAMPLE7.full  EXAMPLE.rst    mpd.hosts.bak
EXAMPLE0.log       EXAMPLE1.out   EXAMPLE2.rst   EXAMPLE4.err   EXAMPLE5.esav  EXAMPLE6.full  EXAMPLE7.out   host.list      output.txt
EXAMPLE0.rst       EXAMPLE1.rst   EXAMPLE3.err   EXAMPLE4.esav  EXAMPLE5.full  EXAMPLE6.out   EXAMPLE7.rst   host.list.bak

If everything went well, the job output file (ANSYSdis.o9752784) should look like this:

+------------------------------------------+
| PALMETTO CLUSTER PBS RESOURCES REQUESTED |
+------------------------------------------+

mem=22gb,ncpus=8,walltime=01:00:00


+-------------------------------------+
| PALMETTO CLUSTER PBS RESOURCES USED |
+-------------------------------------+

cpupercent=27,cput=00:00:17,mem=3964kb,ncpus=8,vmem=327820kb,walltime=00:01:07

The results file (EXAMPLE.rst) contains the results of the simulation which can be viewed using the ANSYS APDL graphical interface:

COMSOL

COMSOL is an application for solving Multiphysics problems. To see the available COMSOL modules on Palmetto:

$ module avail comsol

comsol/4.3b comsol/4.4  comsol/5.0  comsol/5.1  comsol/5.2  comsol/5.3

To see license usage of COMSOL-related packages, you can use the lmstat command:

/software/USR_LOCAL/flexlm/lmstat -a -c /software/USR_LOCAL/flexlm/licenses/comsol.dat 

Graphical Interface

To run the COMSOL graphical interface, you must log-in with tunneling enabled, and then ask for an interactive session:

$ qsub -I -X -l select=1:ncpus=8:mpiprocs=8:mem=6gb:interconnect=1g,walltime=00:15:00

Once logged-in to an interactive compute node, to launch the interactive viewer, you can use the comsol command to run COMSOL:

$ module add comsol/5.2
$ comsol -np 8 -tmpdir $TMPDIR

The -np option can be used to specify the number of CPU cores to use. Remember to always use $TMPDIR as the working directory for COMSOL jobs.

Batch Mode

To run COMSOL in batch mode on Palmetto cluster, you can use the example batch scripts below as a template. The first example demonstrates running COMSOL using multiple cores on a single node, while the second demonstrates running COMSOL across multiple nodes using MPI. You can obtain the files required to run this example using the following commands:

$ module add examples
$ example get COMSOL
$ cd COMSOL && ls

job.sh  job_mpi.sh

Both of these examples run the "Heat Transfer by Free Convection" application described here. In addition to the job.sh and job_mpi.sh scripts, to run the examples and reproduce the results, you will need to download the file free_convection.mph (choose the correct version) provided with the description (login required).

COMSOL batch job on a single node, using multiple cores:

#!/bin/bash
#PBS -N COMSOL
#PBS -l select=1:ncpus=8:mem=32gb,walltime=01:30:00
#PBS -j oe

module purge
module add comsol/5.2

cd $PBS_O_WORKDIR

comsol batch -np 8 -tmpdir $TMPDIR -inputfile free_convection.mph -outputfile free_convection_output.mph

COMSOL batch job across several nodes

#!/bin/bash
#PBS -N COMSOL
#PBS -l select=2:ncpus=8:mpiprocs=8:mem=32gb,walltime=01:30:00
#PBS -j oe

module purge
module add comsol/5.2

cd $PBS_O_WORKDIR

uniq $PBS_NODEFILE > comsol_nodefile
comsol batch -clustersimple -f comsol_nodefile -tmpdir $TMPDIR -inputfile free_convection.mph -outputfile free_convection_output.mph

GROMACS

  • Gromacs is an architecture-specific software. It performs best when installed and configured on the specific hardware.
  • To simplify the process of setting up gromacs, we recommend that you set up your local spack using instructions from the following links.

  • Get a node (choose the node type you wish to run Gromacs on)

$ qsub -I -l select=1:ncpus=20:mem=20gb:ngpus=1:gpu_model=p100,walltime=5:00:00
  • Identify architecture type:
$ lscpu | grep "Model name"

Select the crrect architecture based on the CPU model:

  • E5-2665: sandybridge
  • E5-2680: sandybridge
  • E5-2670v2: ivybridge
  • E5-2680v3: haswell
  • E5-2680v4: broadwell
  • 6148G: skylake
  • 6252G: cascadelake
  • 6238R: cascadelake

In this example, given the previous qsub command, we most likely will get a broadwell node:

$ export TARGET=broadwell

Installing cuda

$ spack spec -Il cuda@10.2.89 target=$TARGET
$ spack install cuda@10.2.89 target=$TARGET
$ spack find -ld cuda@10.2.89

You should remember the hash value of the cuda installation for later use.

Installing fftw

spack spec -Il fftw@3.3.8~mpi+openmp target=$TARGET
spack install fftw@3.3.8~mpi+openmp target=$TARGET

Installing gromacs

$ export MODULEPATH=$MODULEPATH:~/software/ModuleFiles/modules/linux-centos8-broadwell/
$ module load fftw-3.3.8-gcc-8.3.1-openmp cuda-10.2.89-gcc-8.3.1
$ spack spec -Il gromacs@2018.3+cuda~mpi target=$TARGET ^cuda/hash_value_you_memorize_earlier
$ spack install gromacs@2018.3+cuda~mpi target=$TARGET ^cuda/hash_value_you_memorize_earlier

Gromacs will now be available in your local module path

Running GROMACS interactively

As an example, we'll consider running the GROMACS ADH benchmark.

First, request an interactive job:

$ qsub -I -l select=1:ncpus=20:mem=100gb:ngpus=2:gpu_model=p100,walltime=5:00:00
$ mkdir -p /scratch1/$USER/gromacs_ADH_benchmark
$ cd /scratch1/$USER/gromacs_ADH_benchmark
$ wget ftp://ftp.gromacs.org/pub/benchmarks/ADH_bench_systems.tar.gz
$ tar -xzf ADH_bench_systems.tar.gz
$ export MODULEPATH=$MODULEPATH:~/software/ModuleFiles/modules/linux-centos8-broadwell/
$ export OMP_NUM_THREADS=10
$ module load fftw-3.3.8-gcc-8.3.1-openmp cuda-10.2.89-gcc-8.3.1 gromacs-2018.3-gcc-8.3.1-cuda10_2-openmp
$ gmx mdrun -g adh_cubic.log -pin on -resethway -v -noconfout -nsteps 10000 -s topol.tpr -ntmpi 2 -ntomp 10

After the last command above completes, the .edr and .log files produced by GROMACS should be visible. Typically, the next step is to copy these results to the output directory:

Running GROMACS in batch mode

The PBS batch script for submitting the above is assumed to be inside /scratch1/$USER/gromacs_ADH_benchmark, and this directory already contains the input files:

#PBS -N adh_cubic
#PBS -l select=1:ngpus=2:ncpus=16:mem=20gb:gpu_model=p100,walltime=5:00:00

cd $PBS_O_WORKDIR

export MODULEPATH=$MODULEPATH:~/software/ModuleFiles/modules/linux-centos8-broadwell/
module load fftw-3.3.8-gcc-8.3.1-openmp cuda-10.2.89-gcc-8.3.1 gromacs-2018.3-gcc-8.3.1-cuda10_2-openmp
gmx mdrun -g adh_cubic.log -pin on -resethway -v -noconfout -nsteps 10000 -s topol.tpr -ntmpi 2 -ntomp 10

HOOMD

Run HOOMD

Now the HOOMD-BLUE v2.3.5 has been installed. Create a simple python file “test_hoomd.py” to run HOOMD

import hoomd
import hoomd.md
hoomd.context.initialize("");
hoomd.init.create_lattice(unitcell=hoomd.lattice.sc(a=2.0), n=5);
nl = hoomd.md.nlist.cell();
lj = hoomd.md.pair.lj(r_cut=2.5, nlist=nl);
lj.pair_coeff.set('A', 'A', epsilon=1.0, sigma=1.0);
hoomd.md.integrate.mode_standard(dt=0.005);
all = hoomd.group.all();
hoomd.md.integrate.langevin(group=all, kT=0.2, seed=42);hoomd.analyze.log(filename="log-output.log",
                  quantities=['potential_energy', 'temperature'],
                  period=100,
                  overwrite=True);
hoomd.dump.gsd("trajectory.gsd", period=2e3, group=all, overwrite=True);
hoomd.run(1e4);

If you have logged out of the node, request an interactive session on a GPU node and add required modules:

$ qsub -I -l select=1:ncpus=16:mem=64gb:ngpus=2:gpu_model=p100:mpiprocs=16:interconnect=fdr,walltime=2:00:00
$ module add anaconda3/5.1.0 gcc/5.4.0 cuda-toolkit/9.0.176

Run the script interactively:

$ python test_hoomd.py

Alternatively, you can setup a PBS job script to run HOOMD in batch mode. A sample is below for Test_Hoomd.sh:

#PBS -N HOOMD
#PBS -l select=1:ncpus=16:mem=64gb:ngpus=2:gpu_model=p100:mpiprocs=16:interconnect=fdr,walltime=02:00:00
#PBS -j oe

module purge
module add anaconda3/5.1.0 gcc/5.4.0 cuda-toolkit/9.0.176

cd $PBS_O_WORKDIR
python test_hoomd.py

Submit the job:

$ qsub Test_Hoomd.sh

This is it

Java

The Java Runtime Environment (JRE) version 1.6.0_11 is currently available cluster-wide on Palmetto. If a user needs a different version of Java, or if the Java Development Kit (JDK, which includes the JRE) is needed, that user is encouraged to download and install Java (JRE or JDK) for herself. Below is a brief overview of installing the JDK in a user's /home directory.

JRE vs. JDK

The JRE is basically the Java Virtual Machine (Java VM) that provides a platform for running your Java programs. The JDK is the fully featured Software Development Kit for Java, including the JRE, compilers, and tools like JavaDoc and Java Debugger used to create and compile programs.

Usually, when you only care about running Java programs, the JRE is all you'll need. If you are planning to do some Java programming, you will need the JDK.

Downloading the JDK

The JDK cannot be downloaded directly using the wget utility because a user must agree to Oracle's Java license ageement when downloading. So, download the JDK using a web browser and transfer the downloaded jdk-7uXX-linux-x64.tar.gz file to your /home directory on Palmetto using scp, sftp, or FileZilla:

scp  jdk-7u45-linux-x64.tar.gz galen@login.palmetto.clemson.edu:/home/galen
jdk-7u45-linux-x64.tar.gz                                                             100%  132MB  57.7KB/s   38:58

Installing the JDK

The JDK is distributed in a Linux x86_64 compatible binary format, so once it has been unpacked, it is ready to use (no need to compile). However, you will need to setup your environment for using this new package by adding lines similar to the following at the end of your ~/.bashrc file:

export JAVA_HOME=/home/galen/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export MANPATH=$JAVA_HOME/man:$MANPATH

Once this is done, you can log-out and log-in again or simply source your ~/.bashrc file and then you'll be ready to begin using your new Java installation.

Julia

Julia: high-level dynamic programming language that was originally designed to address the needs of high-performance numerical analysis and computational science.

Run Julia in Palmetto: Interactive

There are a few different versions of Julia available on the cluster.

$ module avail julia
--------------------------------------------- /software/modulefiles ---------------------------------------------
julia/0.6.2 julia/0.7.0 julia/1.0.4 julia/1.1.1

Let demonstrate how to use julia/1.1.1 in the Palmetto cluster together with Gurobi Optimizer (a commercial optimization solver for linear programming), quadratic programming, etc. Clemson University has different version of licenses for Gurobi solver. In this example, I would like to use Julia and Gurobi solver to solve a linear math problem using Palmetto HPC

Problems: Maximize x+y
Given the following constrains:

50 x + 24 y <= 2400
30 x + 33 y <= 2100
x >= 5, y >= 45

Let prepare a script to solve this problem, named: jump_gurobi.jl. You can save this file to: /scratch1/$username/Julia/

# Request for a compute node:
$ qsub -I -l select=1:ncpus=8:mem=16gb:interconnect=fdr,walltime=01:00:00
# Go to working folder:
$ cd /scratch1/$username/Julia
$ nano jump_gurobi.jl

Then type/copy the following code to the file jump_gurobi.jl

import Pkg
using JuMP
using Gurobi

m = Model(with_optimizer(Gurobi.Optimizer))

@variable(m, x >= 5)
@variable(m, y >= 45)

@objective(m, Max, x + y)
@constraint(m, 50x + 24y <= 2400)
@constraint(m, 30x + 33y <= 2100)

status = optimize!(m)
println(" x = ", JuMP.value(x), " y = ", JuMP.value(y))

Save the jump_gurobi.jl file then you are ready to run julia:

$ module add julia/1.1.1 gurobi/7.0.2
$ julia
# the julia prompt appears:
julia> 

# Next install Package: JuMP and Gurobi

julia> using Pkg
julia> Pkg.add("JuMP")
julia> Pkg.add("Gurobi")
julia> exit()

Run the julia_gurobi.jl script:

$ julia jump_gurobi.jl

Run Julia in Palmetto: Batch mode

  • Alternatively, you can setup a PBS job script to run Julia in batch mode. A sample is below for submit_julia.sh: You must install the JuMP and Gurobi package first (one time installation)
#!/bin/bash
#PBS -N Julia
#PBS -l select=1:ncpus=8:mem=16gb:interconnect=fdr
#PBS -l walltime=02:00:00
#PBS -j oe

module purge
module add julia/1.1.1 gurobi/7.0.2

cd $PBS_O_WORKDIR
julia jump_gurobi.jl > output_JuMP.txt

Submit the job:

$ qsub submit_julia.sh

The output file can be found at the same folder: output_JuMP.txt

Install your own Julia package using conda environment and running in Jupyterhub

In addition to traditional compilation of Julia, it is possible to install your own version of Julia and setup kernel to work using Jupterhub.

# Request for a compute node:
$ qsub -I -l select=1:ncpus=8:mem=16gb:interconnect=fdr,walltime=01:00:00
$ module add anaconda3/5.1.0
# Create conda environment with the name as "Julia"
$ conda create -n Julia -c conda-forge julia
$ source activate Julia
(Julia) [$username@node1234 ~]$
(Julia) [$username@node1234 ~]$ julia
julia> 
julia> using Pkg
julia> Pkg.add("IJulia")
julia> exit

Exit Julia and Start Jupyterhub in Palmetto After spawning, Click on New kernel in Jupyterhub, you will see Julia 1.1.1 kernel available for use

Type in the follwing code to test:

println("Hello world")

LAMMPS

There are a few different versions of LAMMPS available on the cluster.

$ module avail lammps

------------------------- /software/ModuleFiles/modules/linux-centos8-x86_64 -------------------------
lammps/20190807-gcc/8.3.1-cuda10_2-mpi-openmp-user-omp
lammps/20200505-gcc/8.3.1-cuda10_2-kokkos-mpi-nvidia_P-openmp-user-omp
lammps/20200505-gcc/8.3.1-cuda10_2-kokkos-mpi-nvidia_K-openmp-user-omp
lammps/20200505-gcc/8.3.1-cuda10_2-kokkos-mpi-nvidia_V-openmp-user-omp (D)

Note: letter P, K, V stand for GPU pascal (p100), kepler (k20, k40) and volta (v100)

Installing custom LAMMPS on Palmetto

LAMMPS comes with a wide variety of supported packages catering to different simulation techniques. It is possible to build your own LAMMPS installation on Palmetto. We discuss two examples below, one using kokkos with GPU, one using kokkos without GPU support.

Reserve a node, and pay attention to its GPU model.

$ qsub -I -l select=1:ncpus=24:mem=100gb:ngpus=2:gpu_model=v100,walltime=10:00:00

Create a directory named software (if you don't already have it) in your home directory, and change to that directory.

$ mkdir ~/software
$ cd ~/software
  • Create a subdirectory called lammps inside software.
  • Download the latest version of lammps and untar.
  • In this example, we use the 20200721, the latest dated version of lammps. You can set up spack (see User Software Installation), then runs spack info lammps to see the latest recommended version of lammps.
  • You can change the name of the untarred directory to something easier to manage.
$ mkdir lammps
$ cd lammps
$ wget https://github.com/lammps/lammps/archive/patch_21Jul2020.tar.gz
$ tar zfs path_21Jul2020.tar.gz
$ mv lammps-path_21Jul2020 20200721

In the recent versions, lammps use cmake as their build system. As a result, we will be able to build multiple lammps executables within a single source download.

Lammps build with kokkos and gpu

  • Create a directory called build-kokkos-cuda
  • Change into this directory.
$ mkdir build-kokkos-cuda
$ cd build-kokkos-cuda

In building lammps, you will need to modify two cmake files, both inside ../cmake/presets/ directory (this is a relative path assuming you are inside the previously created build-kokkos-cuda). A set of already prepared cmake templates are available inside ../cmake/presets, but you will have to modify them. It is recommended that you use ../cmake/presets/minimal.cmake and ../cmake/presets/kokkos-cuda.cmake as starting points.

For add-on simulation packages, make a copy of ../cmake/presets/minimal.cmake, and use ../cmake/presets/all_on.cmake as a reference point to see what is needed. Let's say we want user-meamc and user-fep in addition to what's in minimal.cmake for simulation techniques. We also need to inlcude kokkos.

$ more ../cmake/presets/minimal.cmake
$ more ../cmake/presets/all_on.cmake
$ cp ../cmake/presets/minimal.cmake ../cmake/presets/my.cmake

Use your favorite editor to add the necessary package names (in capitalized form) to my.cmake. Check the contents afterward.

$ more ../cmake/presets/my.cmake

  • Next, we need to modify ../cmake/presets/kokkos-cuda.cmake so that kokkos is built to the correct architectural specification. For Palmetto, the follow
Palmetto GPU architectures Architecture name for Kokkos
K20 and K40 KEPLER35
P100 PASCAL60
V100 and V100S VOLTA70
  • Since we specified v100 in the initial qsub, ../cmake/presets/kokkos-cuda.cmake will need to be modified to use VOLTA70.

  • We will need to load three supporting modules from Palmetto. We will load modules that have been compiled for the specific architecture of v100 nodes.
$ module purge
$ export MODULEPATH=/software/ModuleFiles/modules/linux-centos8-skylake:$MODULEPATH
$ module load cmake/3.17.3-gcc/8.3.1 fftw/3.3.8-gcc/8.3.1-mpi-openmp cuda/10.2.89-gcc/8.3.1 openmpi/3.1.6-gcc/8.3.1
  • Build and install
cmake -C ../cmake/presets/my.cmake -C ../cmake/presets/kokkos-cuda.cmake ../cmake
cmake --build .
  • Test on LAMMPS's LJ data
$ mkdir /scratch1/$USER/lammps
$ cd /scratch1/$USER/lammps
$ wget https://lammps.sandia.gov/inputs/in.lj.txt
$ mpirun -np 2 ~/software/lammps/20200721/build-kokkos-cuda/lmp -k on g 2 -sf kk -in in.lj.txt

Running LAMMPS - an example

Several existing examples are in the installed folder: lammps-7Aug19/examples/ Detailes description of all examples are here.

We run an example accelerate using different package Here is a sample batch script job.sh for this example:

#PBS -N accelerate 
#PBS -l select=1:ncpus=8:mpiprocs=8:ngpus=2:gpu_model=v100:mem=64gb,walltime=4:00:00
#PBS -j oe

cd $PBS_O_WORKDIR
module purge
module load lammps/20200505-gcc/8.3.1-cuda10_2-kokkos-mpi-nvidia_V-openmp-user-omp

mpirun -np 8 lmp -in in.lj > output.txt        # 8 MPI, 8 MPI/GPU
# to write the output to a file

Lammps build with kokkos and gpu

This is a bit similar to the build with kokkos and gpu. In a non-gpu build, kokkos will help manage the OpenMP threads, and the corresponding make file is ../cmake/presets/kokkos-openmp.make

Several way to run LAAMPS

# Running LAMMPS with KOKKOS packages using 8 MPI tasks/nodes, no multi-threading
mpirun -np 8 lmp -k on -sf kk -in in.lj > output_kokkos1.txt 

# Running LAMMPS with KOKKOS packages using 2 MPI tasks/nodes, 8 threads/tasks
mpirun -np 2 lmp -k on t 8 -sf kk -in in.lj > output_kokkos2.txt 

# Running LAMMPS with 1 GPU 
lmp -sf gpu -pk gpu 1 -in in.lj > output_1gpu.txt

# Running LAMMPS with 4 mpi task share 2 gpus on single 8 cores node
mpirun -np 4 lmp -sf gpu -pk gpu 2 -in in.lj > output_2gpu.txt

# Running LAMMPS with 1 MPI task, 8 threads
export OMP_NUM_THREADS=8
lmp -sf omp -in in.lj > output_1mpi_8omp.txt         

# Running LAMMPS with 4 MPI task, 4 OMP threads
export OMP_NUM_THREADS=4
mpirun -np 4 lmp -sf omp -pk omp 4 -in in.lj > output_4mpi_4omp.txt

# Running LAMMPS with OPT in serial mode
lmp -sf opt -in in.lj > output_opt_serial.txt

# Running LAMMPS with OPT in parallel mode
mpirun -np 8 lmp -sf opt -in in.lj > output_opt_parallel.txt

For more information on comparison between various accelerator packages please visit: https://lammps.sandia.gov/doc/Speed_compare.html

MATLAB

Checking license usage for MATLAB

You can check the availability of MATLAB licenses using the lmstat command:

$ /software/USR_LOCAL/flexlm/lmstat -a -c /software/USR_LOCAL/flexlm/licenses/matlab.dat 

Running the MATLAB graphical interface

To launch the MATLAB graphical interface, you must first you must log-in with tunneling enabled, and then ask for an interactive session:

$ qsub -I -X -l select=1:ncpus=2:mem=24gb,walltime=1:00:00

Once logged-in, you must load one of the MATLAB modules:

$ module add matlab/2018b

And then launch the MATLAB program:

$ matlab

Warning: DO NOT attempt to run MATLAB right after logging-in (i.e., on the login001 node). Always ask for an interactive job first. MATLAB sessions are automatically killed on the login node.

Running the MATLAB command line without graphics

To use the MATLAB command-line interface without graphics, you can additionally use the -nodisplay and -nosplash options:

$ matlab -nodisplay -nosplash

To quit matlab command-line interface, type:

$ exit

MATLAB in batch jobs

To use MATLAB in your batch jobs, you can use the -r switch provided by MATLAB, which lets you run commands specified on the command-line. For example:

$ matlab -nodisplay -nosplash -r myscript

will run the MATLAB script myscript.m, Or:

$ matlab -nodisplay -nosplash < myscript.m > myscript_results.txt

will run the MATLAB script myscript.m and write the output to myscript_results.txt file. Thus, an example batch job using MATLAB could have a batch script as follows:

#!/bin/bash
#
#PBS -N test_matlab
#PBS -l select=1:ncpus=1:mem=5gb
#PBS -l walltime=1:00:00

module add matlab/2018b

cd $PBS_O_WORKDIR

taskset -c 0-$(($OMP_NUM_THREADS-1)) matlab -nodisplay -nosplash < myscript.m > myscript_results.txt

Note: MATLAB will sometimes attemps to use all available CPU cores on the node it is running on. If you haven't reserved all cores on the node, your job may be killed if this happens. To avoid this, you can use the taskset utility to set the "core affinity" (as shown above). As an example:

$ taskset 0-2 <application>

will limit application to using 3 CPU cores. On Palmetto, the variable OMP_NUM_THREADS is automatically set to be the number of cores requested for a job. Thus, you can use 0-$((OMP_NUM_THREADS-1)) as shown in the above batch script to use all the cores you requested.

Compiling MATLAB code to create an executable

Often, you need to run a large number of MATLAB jobs concurrently (e.g,m each job operating on different data). In such cases, you can avoid over-utilizing MATLAB licenses by compiling your MATLAB code into an executable. This can be done from within the MATLAB command-line as follows:

$ matlab -nodisplay -nosplash
>> mcc -R -nodisplay -R -singleCompThread -R -nojvm -m mycode.m

Note: MATLAB will try to use all the available CPU cores on the system where it is running, and this presents a problem when your compiled executable on the cluster where available cores on a single node might be shared amongst mulitple users. You can disable this "feature" when you compile your code by adding the -R -singleCompThread option, as shown above.

The above command will produce the executable mycode, corresponding to the M-file mycode.m. If you have multiple M-files in your project and want to create a single excutable, you can use a command like the following:

>> mcc -R -nodisplay -R -singleCompThread -R -nojvm -m my_main_code.m myfunction1.m myfunction2.m myfunction3.m

Once the executable is produced, you can run it like any other executable in your batch jobs. Of course, you'll also need the same matlab and (optional) GCC module loaded for your job's runtime environment.

Paraview

Using Paraview+GPUs to visualize very large datasets

Paraview can also use multiple GPUs on Palmetto cluster to visualize very large datasets. For this, Paraview must be run in client-server mode. The "client" is your local machine on which Paraview must be installed, and the "server" is the Palmetto cluster on which the computations/rendering is done.

  1. The version of Paraview on the client needs to match exactly the version of Paraview on the server. The client must be running Linux. You can obtain the source code used for installation of Paraview 5.0.1 on Palmetto from /software/paraview/ParaView-v5.0.1-source.tar.gz. Copy this file to the client, extract it and compile Paraview. Compilation instructions can be found in the Paraview documentation.

  2. You will need to run the Paraview server on Palmetto cluster. First, log-in with X11 tunneling enabled, and request an interactive session:

$ qsub -I -X -l select=4:ncpus=2:mpiprocs=2:ngpus=2:mem=32gb,walltime=1:00:00

In the above example, we request 4 nodes with 2 GPUs each.

  1. Next, launch the Paraview server:
$ module add paraview/5.0
$ export DISPLAY=:0
$ mpiexec -n 8 pvserver -display :0

The server will be serving on a specific port number (like 11111) on this node. Note this number down.

  1. Next, you will need to set up "port-forwarding" from the lead node (the node your interactive session is running one) to your local machine. This can be done by opening a terminal running on the local machine, and typing the following:
$ ssh -L 11111:nodeXYZ:11111 username@login.palmetto.clemson.edu
  1. Once port-forwarding is set up, you can launch Paraview on your local machine,

Pytorch

This page explains how to install the Pytorch package for use with GPUs on the cluster, and how to use it from Jupyter Notebook via JupyterHub.

GPU node

1) Request an interactive session on a GPU node. For example:

$ qsub -I -l select=1:ncpus=16:mem=20gb:ngpus=1:gpu_model=p100,walltime=3:00:00

2) Load the Anaconda module:

$ module load cuda/9.2.88-gcc/7.1.0 cudnn/7.6.5.32-9.2-linux-x64-gcc/7.1.0-cuda9_2 anaconda3/2019.10-gcc/8.3.1

3) Create a conda environment called tf_env (or any name you like):

$ conda create -n pytorch_env pip python=3.8.3

4) Activate the conda environment:

$ conda activate pytorch_env

5) Install Pytorch with GPU support from the pytorch channel:

$ conda install pytorch torchvision cudatoolkit=9.2 -c pytorch

This will automatically install some packages that are required for Pytorch, like MKL or NumPy. To see the list of installed packages, type

$ conda list

If you need additional packages (for example, Pandas), you can type

$ conda install pandas

6) You can now run Python and test the install:

$ python
>>> import torch

Each time you login, you will first need to load the required modules and also activate the pytorch_env conda environment before running Python:

$ module load cuda/9.2.88-gcc/7.1.0 cudnn/7.6.5.32-9.2-linux-x64-gcc/7.1.0-cuda9_2 anaconda3/2019.10-gcc/8.3.1
$ conda activate pytorch_env

Add Jupyter kernel:

If you would like to use Pytorch from Jupyter Notebook on Palmetto via JupyterHub, you need the following additional steps:

1) After you have installed Pytorch, install Jupyter in the same conda environment:

$ conda install -c conda-forge jupyterlab

2) Now, set up a Notebook kernel called "Pytorch". For Pytorch with GPU support, do:

$ python -m ipykernel install --user --name pytorch_env --display-name Pytorch

3) Create/edit the file .jhubrc in your home directory:

$ cd
$ nano .jhubrc

Add the following two lines to the .jhubrc file, then exit.

module load cuda/9.2.88-gcc/7.1.0 cudnn/7.6.5.32-9.2-linux-x64-gcc/7.1.0-cuda9_2 anaconda3/2019.10-gcc/8.3.1

4) Log into JupyterHub. Make sure you have GPU in your selection if you want to use the GPU pytorch kernel

5) Once your JupyterHub has started, you should see the Pytorch kernel in your list of kernels in the Launcher.

6) You are now able to launch a notebook using the pytorch with GPU kernel:

rclone

rclone is a command-line program that can be used to sync files and folders to and from cloud services such as Google Drive, Amazon S3, Dropbox, and many others.

In this example, we will show how to use rclone to sync files to a Google Drive account, but the official documentation has specific instructions for other services.

Setting up rclone for use with Google Drive on Palmetto

To use rclone with any of the above cloud storage services, you must perform a one-time configuration. You can configure rclone to work with as many services as you like.

For the one-time configuration, you will need to log-in with tunneling enabled. Once logged-in, ask for an interactive job:

$ qsub -I -X

Once the job starts, load the rclone module:

$ module add rclone/1.23

After rclone is loaded, you must set up a "remote". In this case, we will configure a remote for Google Drive. You can create and manage a separate remote for each cloud storage service you want to use. Start by entering the following command:

$ rclone config

n) New remote
q) Quit config
n/q>

Hit n then Enter to create a new remote host

name>

Provide any name for this remote host. For example: gmaildrive

What type of source is it?
Choose a number from below
 1) amazon cloud drive
 2) drive
 3) dropbox
 4) google cloud storage
 5) local
 6) s3
 7) swift
type>

Provide any number for the remote source. For example choose number 2 for goolge drive.

Google Application Client Id - leave blank normally.
client_id> # Enter to leave blank
Google Application Client Secret - leave blank normally.
client_secret> # Enter to leave blank

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n>

Use y if you are not sure

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...

This will open the Firefox web browser, allowing you to log-in to your Google account. Enter your username and password then accept to let rclone access your Goolge drive. Once this is done, the browser will ask you to go back to rclone to continue.

Got code
--------------------
[gmaildrive]
client_id =
client_secret =
token = {"access_token":"xyz","token_type":"Bearer","refresh_token":"xyz","expiry":"yyyy-mm-ddThh:mm:ss"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d>

Select y to finish configure this remote host. The gmaildrive host will then be created.

Current remotes:

Name                 Type
====                 ====
gmaildrive           drive

e) Edit existing remote
n) New remote
d) Delete remote
q) Quit config
e/n/d/q>

After this, you can quit the config using q, kill the job and exit this ssh session:

$ exit

Using rclone

Whenever transfering files (using rclone or otherwise), login to the transfer node xfer01-ext.palmetto.clemson.edu.

  • In MobaXterm, create a new ssh session with xfer01-ext.palmetto.clemson.edu as the Remote host
  • In MacOS, open new terminal and ssh user@xfer01-ext.palmetto.clemson.edu

Once logged-in, load the rclone module:

$ module add rclone/1.23

You can check the content of the remote host gmaildrive:

$ rclone ls gmaildrive:
$ rclone lsd gmaildrive: 

You can use rclone to (for example) copy a file from Palmetto to any folder in your Google Drive:

$ rclone copy /path/to/file/on/palmetto gmaildrive:/path/to/folder/on/drive

Or if you want to copy to a specific destination on Google Drive back to Palmetto:

$ rclone copy gmaildrive:/path/to/folder/on/drive /path/to/file/on/palmetto

Additional rclone commands can be found here.

Tensorflow

This page explains how to install the Tensorflow package for use with GPUs on the cluster, and how to use it from Jupyter Notebook via JupyterHub.

Installing Tensorflow GPU node

1) Request an interactive session on a GPU node. For example:

$ qsub -I -l select=1:ncpus=16:mem=20gb:ngpus=1:gpu_model=p100,walltime=3:00:00

2) Load the Anaconda module:

$ module load cuda/10.2.89-gcc/8.3.1 cudnn/8.0.0.180-10.2-linux-x64-gcc/8.3.1 anaconda3/2019.10-gcc/8.3.1

3) Create a conda environment called tf_env (or any name you like):

$ conda create -n tf_gpu_env

4) Activate the conda environment:

$ conda activate tf_gpu_env

5) Install Tensorflow with GPU support from the anaconda channel:

$ conda install -c anaconda tensorflow-gpu=2.2.0 python=3.8.3

This will automatically install some packages that are required for Tensorflow, like SciPy or NumPy. To see the list of installed packages, type

$ conda list

If you need additional packages (for example, Pandas), you can type

$ conda install pandas

6) You can now run Python and test the install:

$ python
>>> import tensorflow as tf

Each time you login, you will first need to load the required modules and also activate the tf_env conda environment before running Python:

$ module load cuda/10.2.89-gcc/8.3.1 cudnn/8.0.0.180-10.2-linux-x64-gcc/8.3.1 anaconda3/2019.10-gcc/8.3.1
$ conda activate tf_env

Installing Tensorflow for non-GPU node

1) Request an interactive session for non-GPU node. For example:

$ qsub -I -l select=1:ncpus=16:mem=20gb,walltime=3:00:00

2) Load the required modules

$ module load cuda/10.2.89-gcc/8.3.1 cudnn/8.0.0.180-10.2-linux-x64-gcc/8.3.1 anaconda3/2019.10-gcc/8.3.1

3) Create a conda environment called tf_env_cpu:

$ conda create -n tf_cpu_env

4) Activate the conda environment:

$ conda activate tf_cpu_env

5) Install Tensorflow from the anaconda channel:

$ conda install -c anaconda tensorflow=2.2.0

This will automatically install some packages that are required for Tensorflow, like SciPy or NumPy. To see the list of installed packages, type

$ conda list

If you need additional packages (for example, Pandas), you can type

$ conda install pandas

6) You can now run Python and test the install:

$ python
>>> import tensorflow as tf

Setup Jupyter kernel

If you would like to use Tensorflow from Jupyter Notebook on Palmetto via JupyterHub, you need the following additional steps:

1) After you have installed Tensorflow, install Jupyter in the same conda environment:

$ conda install jupyter

2) Now, set up a Notebook kernel called "Tensorflow". For Tensorflow with GPU support, do:

$ python -m ipykernel install --user --name tf_gpu_env --display-name TensorflowGPU

For Tensorflow without GPU support, do:

$ python -m ipykernel install --user --name tf_cpu_env --display-name Tensorflow

3) Create/edit the file .jhubrc in your home directory:

$ cd
$ nano .jhubrc

Add the following two lines to the .jhubrc file, then exit.

module load cuda/10.2.89-gcc/8.3.1 cudnn/8.0.0.180-10.2-linux-x64-gcc/8.3.1 anaconda3/2019.10-gcc/8.3.1

4) Log into JupyterHub. Make sure you have GPU in your selection if you want to use the GPU tensorflow kernel

5) Once your JupyterHub has started, you should see the Tensorflow kernel in your list of kernels in the Launcher.

6) You are now able to launch a notebook using the tensorflow with GPU kernel:

Example Deep Learning - Multiple Object Detections

This is a demonstration for the tensorflow gpu kernel. Steps for non-gpu kernel are similar.

1) Request an interactive session on a GPU node. For example:

$ qsub -I -l select=1:ncpus=16:mem=20gb:ngpus=1:gpu_model=p100,walltime=3:00:00

2) Load the Anaconda module:

$ module load cuda/10.2.89-gcc/8.3.1 cudnn/8.0.0.180-10.2-linux-x64-gcc/8.3.1 anaconda3/2019.10-gcc/8.3.1

3) Activate the conda environment:

$ conda activate tf_gpu_env

4) Install supporting conda modules:

$ conda install Cython contextlib2 pillow lxml matplotlib utils pandas

5) Setup TensorFlow's Model directory:

$ cd
$ mkdir tensorflow
$ cd tensorflow
$ wget https://github.com/tensorflow/models/archive/master.zip
$ unzip master.zip
$ mv models-master models
$ module load protobuf/3.11.2-gcc/8.3.1
$ cd models/research
$ protoc object_detection/protos/*.proto --python_out=.
$ cp object_detection/packages/tf2/setup.py .
$ python -m pip install --user --use-feature=2020-resolver .
$ cd ~/tensorflow
$ cp /zfs/citi/deeplearning/multi_object.ipynb .

Open Jupyter Server, change into the tensorflow directory, then open and run the multi_object.ipynb notebook.

Singularity

Singularity is a tool for creating and running containers on HPC systems, similar to Docker.

For further information on Singularity, and on downloading, building and running containers with Singularity, please refer to the Singularity documentation. This page provides information about singularity specific to the Palmetto cluster.

Running Singularity

Singularity is installed on all of the Palmetto compute nodes and the Palmetto LoginVMs, but it IS NOT present on the login.palmetto.clemson.edu node.

To run singularity, you may simply run singularity or more specifically /bin/singularity.

e.g.

$ singularity --version
singularity version 3.5.3-1.el7

An important change for existing singularity users

Formerly, Palmetto administrators had installed singularity as a "software module" on Palmetto, but that is no longer the case. If your job scripts have any statements that use the singularity module, then those statements will need to be completely removed; otherwise, your job script may error.

Remove any statements from your job scripts that resemble the following lines:

module <some_command> singularity

Where to download containers

Containers can be downloaded from DockerHub

Example: Running OpenFOAM using Singularity

As an example, we consider installing and running the OpenFOAM CFD solver using Singularity. OpenFOAM can be quite difficult to install manually, but singularity makes it very easy. This example shows how to use singularity interactively, but singularity containers can be run in batch jobs as well.

Start by requesting an interactive job.

NOTE: Singularity can only be run on the compute nodes and Palmetto Login VMs:

$ qsub -I -l select=1:ngpus=2:ncpus=16:mpiprocs=16:mem=120gb,walltime=5:00:00

We recommend that all users store built singularity images in their /home directories. Singularity images can be quite large, so be sure to delete unused or old images:

$ mkdir ~/singularity-images
$ cd ~/singularity-images

Next, we download the singularity image for OpenFOAM from DockerHub. This takes a few seconds to complete:

$ singularity pull docker://openfoam/openfoam6-paraview54

Once the image is downloaded, we are ready to run OpenFOAM. We use singularity shell to start a container, and run a shell in the container.

The -B option is used to "bind" the /scratch2/$USER directory to a directory named /scratch in the container.

We also the --pwd option to specify the working directory in the running container (in this case /scratch). This is always recommended.

Typically, the working directory may be the $TMPDIR directory or one of the scratch directories.

$ singularity shell -B /scratch2/atrikut:/scratch --pwd /scratch openfoam6-paraview54.simg

Before running OpenFOAM commands, we need to source a few environment variables (this step is specific to OpenFOAM):

$ source /opt/openfoam6/etc/bashrc

Now, we are ready to run a simple example using OpenFOAM:

$ cp -r $FOAM_TUTORIALS/incompressible/simpleFoam/pitzDaily .
$ cd pitzDaily
$ blockMesh
$ simpleFoam

The simulation takes a few seconds to complete, and should finish with the following output:

smoothSolver:  Solving for Ux, Initial residual = 0.00012056, Final residual = 7.8056e-06, No Iterations 6
smoothSolver:  Solving for Uy, Initial residual = 0.000959834, Final residual = 6.43909e-05, No Iterations 6
GAMG:  Solving for p, Initial residual = 0.00191644, Final residual = 0.000161493, No Iterations 3
time step continuity errors : sum local = 0.00681813, global = -0.000731564, cumulative = 0.941842
smoothSolver:  Solving for epsilon, Initial residual = 0.000137225, Final residual = 8.98917e-06, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.000215144, Final residual = 1.30281e-05, No Iterations 4
ExecutionTime = 10.77 s  ClockTime = 11 s


SIMPLE solution converged in 288 iterations

streamLine streamlines write:
    seeded 10 particles
    Tracks:10
    Total samples:11980
    Writing data to "/scratch/pitzDaily/postProcessing/sets/streamlines/288"
End

We are now ready to exit the container:

$ exit

Because the directory /scratch was bound to /scratch2/$USER, the simulation output is available in the directory /scratch2/$USER/pitzDaily/postProcessing/:

$ ls /scratch2/$USER/pitzDaily/postProcessing/
sets

GPU-enabled software using Singularity containers (NVIDIA GPU Cloud)

Palmetto also supports use of images provided by the NVIDIA GPU Cloud (NGC).

The provides GPU-accelerated HPC and deep learning containers for scientific computing. NVIDIA tests HPC container compatibility with the Singularity runtime through a rigorous QA process.

Pulling NGC images

Singularity images may be pulled directly from the Palmetto GPU compute nodes, an interactive job is most convenient for this. Singularity uses multiple CPU cores when building the image and so it is recommended that a minimum of 4 CPU cores are reserved. For instance to reserve 4 CPU cores, 2 NVIDIA Pascal GPUs, for 20 minutes the following could be used:

$ qsub -I -lselect=1:ncpus=4:mem=2gb:ngpus=2:gpu_model=p100,walltime=00:20:00

Wait for the interactive job to give you control over the shell.

Before pulling an NGC image, authentication credentials must be set. This is most easily accomplished by setting the following variables in the build environment.

$ export SINGULARITY_DOCKER_USERNAME='$oauthtoken'
$ export SINGULARITY_DOCKER_PASSWORD=<NVIDIA NGC API key>

More information describing how to obtain and use your NVIDIA NGC API key can be found here.

Once credentials are set in the environment, we’re ready to pull and convert the NGC image to a local Singularity image file. The general form of this command for NGC HPC images is:

$ singularity build <local_image> docker://nvcr.io/<registry>/<app:tag>

This singularity build command will download the app:tag NGC Docker image, convert it to Singularity format, and save it to the local file named local_image.

For example to pull the namd NGC container tagged with version 2.12-171025 to a local file named namd.simg we can run:

$ singularity build ~/namd.simg docker://nvcr.io/hpc/namd:2.12-171025

After this command has finished we'll have a Singularity image file, namd.simg:

Running NGC containers

Running NGC containers on Palmetto presents few differences from the run instructions provided on NGC for each application. Application-specific information may vary so it is recommended that you follow the container specific documentation before running with Singularity. If the container documentation does not include Singularity information, then the container has not yet been tested under Singularity.

As all NGC containers are optimized for NVIDIA GPU acceleration we will always want to add the --nv flag to enable NVIDIA GPU support within the container.

The Singularity command below represents the standard form of the Singularity command used on the Palmetto cluster. It will mount the present working directory on the host to /host_pwd in the container process and set the present working directory of the container process to /host_pwd This means that when our process starts it will be effectively running in the host directory the singularity command was launched from.

$ singularity exec --nv -B $(pwd):/host_pwd --pwd /host_pwd <image.simg> <cmd>