MPI programs and containers

Objectives

Learn what complications are involved with MPI containers
Learn how to generate a MPI container for your HPC system

What to consider when creating a container for MPI programs?

Message Passing Interface (MPI) is a standardized API and a programming paradigm where programs can use MPI directives to send messages across thousands of processes. It is commonly used in traditional HPC computing.

To handle the scale of the MPI programs the MPI installations are typically tied to the high-speed interconnect available in the computational cluster and to the queue system that the cluster uses.

This can create the following problems when an MPI program is containerized:

Launching of the MPI job can fail if the program does not communicate with the queue system.
The MPI communication performance can be bad if the program does not utilize the high-speed interconnects correctly.
The container can have portability issues when taking it to a different cluster with different MPI, queue system or interconnect.

To solve these problems we first need to know how MPI works.

How MPI works

The launch process for an MPI program works like this:

A reservation is done in the queue system for some number of MPI tasks.
When the reservation gets the resources, individual MPI programs are launched by the queue system (srun) or by an MPI launcher (mpirun).
User’s MPI program calls the MPI librarires it was built against.
These libraries ask the queue system how many other MPI tasks there are.
Individual MPI tasks start running the program collectively. Communication between tasks is done via fast interconnects.

../_images/mpi_job_structure.png — Figure 1: How MPI programs launch

To make this work with various different queue systems and various different interconnects MPI installations often utilize Process Management Interface (PMI/PMI2/PMIx) when they connect to the queue system and Unified Communication X when they connect to the interconnects.

../_images/mpi_install_structure.png — Figure 2: How MPI installations are usually constructed

How to use MPI with a container

Most common way of running MPI programs in containers is to utilize a hybrid model, where the container contains the same MPI version as the host system.

When using this model the MPI launcher will call the MPI within the container and use it to launch the application.

../_images/mpi_job_structure_hybrid.png — Figure 3: Hybrid MPI job launch

Do note that the MPI inside the container does not necessarily know how to utilize the fast interconnects. We’ll talk about solving this later.

Creating a simple MPI container

Let’s construct an example container that runs a simple MPI benchmark from OSU Micro-Benchmarks. This benchmark suite is useful for testing whether the MPI installation works and whether the MPI can utilize the fast interconnect.

Because different sites have different MPI versions the definition files differ as well. Pick a definition file for your site.

triton-openmpi.def:

Bootstrap: docker
From: ubuntu:latest

%arguments

  NPROCS=4
  OPENMPI_VERSION=4.1.6
  OSU_MICRO_BENCHMARKS_VERSION=7.4

%post

  ### Install OpenMPI dependencies

  apt-get update
  apt-get install -y wget bash gcc gfortran g++ make file bzip2 ca-certificates libucx-dev

  ### Build OpenMPI

  OPENMPI_VERSION_SHORT=$(echo {{ OPENMPI_VERSION }} | cut -f 1-2 -d '.')
  cd /opt
  mkdir ompi
  wget -q https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION_SHORT}/openmpi-{{ OPENMPI_VERSION }}.tar.bz2
  tar -xvf openmpi-{{ OPENMPI_VERSION }}.tar.bz2
  # Compile and install
  cd openmpi-{{ OPENMPI_VERSION }}
  ./configure --prefix=/opt/ompi --with-ucx=/usr
  make -j{{ NPROCS }}
  make install
  cd ..
  rm -rf openmpi-{{ OPENMPI_VERSION }} openmpi-{{ OPENMPI_VERSION }}.tar.bz2

  ### Build example application
  
  export OMPI_DIR=/opt/ompi
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"

  # Build osu benchmarks
  cd /opt
  wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  tar xf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  cd osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}
  ./configure --prefix=/opt/osu-micro-benchmarks CC=/opt/ompi/bin/mpicc CFLAGS=-O3
  make -j{{ NPROCS }}
  make install
  cd ..
  rm -rf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }} osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz

%environment
  export OMPI_DIR=/opt/ompi
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
  export MANPATH="$OMPI_DIR/share/man:$MANPATH"

%runscript
  /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

To build:

srun --mem=16G --cpus-per-task=4 --time=01:00:00 apptainer build triton-openmpi.sif triton-openmpi.def

To run (some extra parameters are needed to prevent launch errors):

$ module load openmpi/4.1.6
$ export PMIX_MCA_gds=hash
$ export UCX_POSIX_USE_PROC_LINK=n
$ export OMPI_MCA_orte_top_session_dir=/tmp/$USER/openmpi
$ srun --partition=batch-milan --mem=2G --nodes=2-2 --ntasks-per-node=1 --time=00:10:00 apptainer run openmpi-triton.sif
srun: job 3521915 queued and waiting for resources
srun: job 3521915 has been allocated resources

# OSU MPI Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
1                       3.98
2                       8.05
4                      15.91
8                      32.03
16                     64.24
32                    125.47
64                    245.52
128                   469.00
256                   877.69
512                  1671.24
1024                 3218.11
2048                 5726.91
4096                 8096.24
8192                10266.18
16384               11242.78
32768               11298.70
65536               12038.27
131072              12196.28
262144              12202.05
524288              11786.58
1048576             12258.48
2097152             12179.43
4194304             12199.89

puhti-openmpi.def:

Bootstrap: docker
From: rockylinux:{{ OS_VERSION }}

%arguments

  NPROCS=4
  OPENMPI_VERSION=4.1.4rc1
  OSU_MICRO_BENCHMARKS_VERSION=7.4
  GCC_VERSION=9
  UCX_VERSION=1.13.0
  OS_NAME=rhel
  OS_VERSION=8.6
  OFED_VERSION=5.6-2.0.9.0

%post

  ### Install OpenMPI dependencies

  # Base tools and newer gcc version

  dnf install -y dnf-plugins-core epel-release
  dnf config-manager  --set-enabled powertools
  dnf install -y make gdb wget numactl-devel which
  dnf -y install gcc-toolset-{{ GCC_VERSION }}
  source /opt/rh/gcc-toolset-{{ GCC_VERSION }}/enable

  # Enable Mellanox OFED rpm repo

  wget https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox
  rpm --import RPM-GPG-KEY-Mellanox
  rm RPM-GPG-KEY-Mellanox
  cd /etc/yum.repos.d/
  wget https://linux.mellanox.com/public/repo/mlnx_ofed/{{ OFED_VERSION }}/{{ OS_NAME }}{{ OS_VERSION }}/mellanox_mlnx_ofed.repo
  cd /

  # Install network library components

  dnf -y install rdma-core ucx-ib-{{ UCX_VERSION }} ucx-devel-{{ UCX_VERSION }} ucx-knem-{{ UCX_VERSION }} ucx-cma-{{ UCX_VERSION }} ucx-rdmacm-{{ UCX_VERSION }}

  ### Install OpenMPI

  dnf -y install openmpi-{{ OPENMPI_VERSION }}

  ### Build example application

  export OMPI_DIR=/usr/mpi/gcc/openmpi-{{ OPENMPI_VERSION }}
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"

  # Build osu benchmarks
  cd /opt
  wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  tar xf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  cd osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}
  ./configure --prefix=/opt/osu-micro-benchmarks CC=mpicc CFLAGS=-O3
  make -j{{ NPROCS }}
  make install
  cd ..
  rm -rf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }} osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz

%environment
  export OMPI_DIR=/usr/mpi/gcc/openmpi-{{ OPENMPI_VERSION }}
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
  export MANPATH="$OMPI_DIR/share/man:$MANPATH"

%runscript
  source /opt/rh/gcc-toolset-{{ GCC_VERSION }}/enable
  /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

To build:

apptainer build puhti-openmpi.sif puhti-openmpi.def

To run (some extra parameters are needed to prevent error messages):

$ module load openmpi/4.1.4
$ export PMIX_MCA_gds=hash
$ srun --account=project_XXXXXXX --partition=large --mem=2G --nodes=2-2 --ntasks-per-node=1 --time=00:10:00 apptainer run puhti-openmpi.sif
srun: job 23736111 queued and waiting for resources
srun: job 23736111 has been allocated resources

# OSU MPI Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
1                       5.17
2                      10.47
4                      20.89
8                      41.63
16                     82.00
32                    166.40
64                    310.73
128                   477.56
256                  1162.51
512                  2250.29
1024                 3941.94
2048                 6174.39
4096                 8029.47
8192                10120.93
16384               10632.41
32768               10892.60
65536               11609.92
131072              11778.05
262144              12015.96
524288              11970.93
1048576             12008.62
2097152             12050.35
4194304             12058.36

lumi-mpich.def:

bootstrap: docker
from: ubuntu:latest

%arguments

  NPROCS=4
  MPICH_VERSION=3.1.4
  OSU_MICRO_BENCHMARKS_VERSION=7.4

%post

  ### Install MPICH dependencies

  apt-get update
  apt-get install -y file g++ gcc gfortran make gdb strace wget ca-certificates --no-install-recommends

  # Build MPICH

  wget -q http://www.mpich.org/static/downloads/{{ MPICH_VERSION }}/mpich-{{ MPICH_VERSION }}.tar.gz
  tar xf mpich-{{ MPICH_VERSION }}.tar.gz
  cd mpich-{{ MPICH_VERSION }}
  ./configure --disable-fortran --enable-fast=all,O3 --prefix=/usr
  make -j{{ NPROCS }}
  make install
  ldconfig

  # Build osu benchmarks
  cd /opt
  wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  tar xf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz
  cd osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}
  ./configure --prefix=/opt/osu-micro-benchmarks CC=mpicc CFLAGS=-O3
  make -j{{ NPROCS }}
  make install
  cd ..
  rm -rf osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }} osu-micro-benchmarks-{{ OSU_MICRO_BENCHMARKS_VERSION }}.tar.gz

%runscript
  /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

Building images in not allowed on LUMI, so you need to build this on your own laptop or some other machine:

apptainer build lumi-mpich.sif lumi-mpich.def

Afterwards copy the image to your work directory in LUMI.

To use the fast interconnect you need to install singularity-bindings-module with EasyBuild:

module load LUMI/23.09 EasyBuild-user
eb singularity-bindings-system-cpeGNU-23.09-noglibc.eb -r

To run the example:

$ module load LUMI/23.09 EasyBuild-user singularity-bindings
$ export SINGULARITY_BIND=$SINGULARITY_BIND,/usr/lib64/libnl-3.so.200
$ srun --account=project_XXXXXXXXX --partition=dev-g --mem=2G --nodes=2-2 --ntasks-per-node=1 --time=00:10:00 singularity run lumi-mpich.sif
srun: job 8108520 queued and waiting for resources
srun: job 8108520 has been allocated resources

# OSU MPI Bandwidth Test v7.4
# Datatype: MPI_CHAR.
# Size      Bandwidth (MB/s)
1                       2.03
2                       4.09
4                       8.17
8                      16.23
16                     32.64
32                     65.57
64                    130.49
128                   260.55
256                   492.28
512                   983.37
1024                 1965.42
2048                 3924.00
4096                 7823.52
8192                14349.54
16384               17373.03
32768               18896.90
65536               20906.04
131072              21811.68
262144              22228.01
524288              22430.80
1048576             22537.82
2097152             22592.50
4194304             22619.96

Utilizing the fast interconnects

In order to get the fast interconnects to work with the hybrid model one can either:

Install the interconnect drivers into the image and build the MPI to use them. This is the normal Hybrid approach described in Figure 3.
Mount cluster’s MPI and other network libraries into the image and use them instead of the container’s MPI while running the MPI program. This is described in Figure 4.

../_images/mpi_job_structure_bound.png — Figure 4: Container with bound system MPI and network libraries

Below are explanations on how the interconnect libraries were provided.

The interconnect support was provided by the libucx-dev-package that provides Infiniband drivers.

triton-openmpi.def, line 15:

The OpenMPI installation was then configured to use these drivers:

triton-openmpi.def, line 26:

  ./configure --prefix=/opt/ompi --with-ucx=/usr

The interconnect support is provided by installing drivers from Mellanox’s Infiniband driver repository:

puhti-openmpi.def, lines 27-38:

  # Enable Mellanox OFED rpm repo

  wget https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox
  rpm --import RPM-GPG-KEY-Mellanox
  rm RPM-GPG-KEY-Mellanox
  cd /etc/yum.repos.d/
  wget https://linux.mellanox.com/public/repo/mlnx_ofed/{{ OFED_VERSION }}/{{ OS_NAME }}{{ OS_VERSION }}/mellanox_mlnx_ofed.repo
  cd /

  # Install network library components

  dnf -y install rdma-core ucx-ib-{{ UCX_VERSION }} ucx-devel-{{ UCX_VERSION }} ucx-knem-{{ UCX_VERSION }} ucx-cma-{{ UCX_VERSION }} ucx-rdmacm-{{ UCX_VERSION }}

Module singularity-bindings mounts the system MPI and network drivers into the container:

$ module load LUMI/23.09 EasyBuild-user singularity-bindings
$ export SINGULARITY_BIND=$SINGULARITY_BIND,/usr/lib64/libnl-3.so.200
$ echo $SINGULARITY_BIND
/opt/cray,/var/spool,/etc/host.conf,/etc/hosts,/etc/nsswitch.conf,/etc/resolv.conf,/etc/ssl/openssl.cnf,/run/cxi,/usr/lib64/libbrotlicommon.so.1,/usr/lib64/libbrotlidec.so.1,/usr/lib64/libcrypto.so.1.1,/usr/lib64/libcurl.so.4,/usr/lib64/libcxi.so.1,/usr/lib64/libgssapi_krb5.so.2,/usr/lib64/libidn2.so.0,/usr/lib64/libjansson.so.4,/usr/lib64/libjitterentropy.so.3,/usr/lib64/libjson-c.so.3,/usr/lib64/libk5crypto.so.3,/usr/lib64/libkeyutils.so.1,/usr/lib64/libkrb5.so.3,/usr/lib64/libkrb5support.so.0,/usr/lib64/liblber-2.4.so.2,/usr/lib64/libldap_r-2.4.so.2,/usr/lib64/libnghttp2.so.14,/usr/lib64/libpcre.so.1,/usr/lib64/libpsl.so.5,/usr/lib64/libsasl2.so.3,/usr/lib64/libssh.so.4,/usr/lib64/libssl.so.1.1,/usr/lib64/libunistring.so.2,/usr/lib64/libzstd.so.1,/lib64/libselinux.so.1,,/usr/lib64/libnl-3.so.200
$ echo $SINGULARITYENV_LD_LIBRARY_PATH
/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/lib64:/opt/cray/libfabric/1.15.2.0/lib64:/opt/cray/xpmem/default/lib64:/usr/lib64:/opt/cray/pe/gcc-libs

ABI compatibility in MPI

Different MPI installations do not have necessarily have application binary interface (ABI) compatibility. This means that software built with certain MPI installation does not necessarily run with another MPI installation.

Quite often MPI programs are built with the same version of MPI that will be used to run the program. However, in containerized applications the runtime MPI version might change if an outside MPI is bound into the container.

This can work as there is some ABI compatibility within an MPI family (OpenMPI, MPICH). For more info, see OpenMPI’s page on version compatibility and MPICH’s ABI Compatibility Initiative.

There are also projects like E4S Container Launcher and WI4MPI (Wrapper Interface for MPI) that aim to bypass this problem by creating a wrapper interfaces that the program in the container can be built against. This wrapper can then use different MPI implementations during runtime.

Example on portability: LAMMPS

LAMMPS is a classical molecular dynamics simulation code with a focus on materials modeling.

Let’s build a container with LAMMPS in it:

lammps-openmpi.def:

Bootstrap: docker
From: ubuntu:latest

%arguments

  NPROCS=4
  OPENMPI_VERSION=4.1.6
  LAMMPS_VERSION=29Aug2024

%post

  ### Install OpenMPI dependencies

  apt-get update
  apt-get install -y wget bash gcc gfortran g++ make file bzip2 ca-certificates libucx-dev

  ### Build OpenMPI

  OPENMPI_VERSION_SHORT=$(echo {{ OPENMPI_VERSION }} | cut -f 1-2 -d '.')
  cd /opt
  mkdir ompi
  wget -q https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION_SHORT}/openmpi-{{ OPENMPI_VERSION }}.tar.bz2
  tar -xvf openmpi-{{ OPENMPI_VERSION }}.tar.bz2
  # Compile and install
  cd openmpi-{{ OPENMPI_VERSION }}
  ./configure --prefix=/opt/ompi --with-ucx=/usr
  make -j{{ NPROCS }}
  make install
  cd ..
  rm -rf openmpi-{{ OPENMPI_VERSION }} openmpi-{{ OPENMPI_VERSION }}.tar.bz2

  ### Build example application

  # Install LAMMPS dependencies
  apt-get install -y cmake

  export OMPI_DIR=/opt/ompi
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
  export CMAKE_PREFIX_PATH="$OMPI_DIR:$CMAKE_PREFIX_PATH"
 
  # Build LAMMPS
  cd /opt
  wget -q https://download.lammps.org/tars/lammps-{{ LAMMPS_VERSION }}.tar.gz
  tar xf lammps-{{ LAMMPS_VERSION }}.tar.gz
  cd lammps-{{ LAMMPS_VERSION }}
  cmake -S cmake -B build \
    -DCMAKE_INSTALL_PREFIX=/opt/lammps \
    -DBUILD_MPI=yes \
    -DBUILD_OMP=yes
  cmake --build build --parallel {{ NPROCS }} --target install
  cp -r examples /opt/lammps/examples
  cd ..
  rm -rf lammps-{{ LAMMPS_VERSION }} lammps-{{ LAMMPS_VERSION }}.tar.gz

%environment
  export OMPI_DIR=/opt/ompi
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
  export MANPATH="$OMPI_DIR/share/man:$MANPATH"

  export LAMMPS_DIR=/opt/lammps
  export PATH="$LAMMPS_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$LAMMPS_DIR/lib:$LD_LIBRARY_PATH"
  export MANPATH="$LAMMPS_DIR/share/man:$MANPATH"

%runscript
  exec /opt/lammps/bin/lmp "$@"

Let’s also create a submission script that runs a LAMMPS example where an indent will pushes against a material:

run_lammps_indent.sh:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --mem=2G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --output=lammps_indent.out

# Copy example from image
apptainer exec lammps-openmpi.sif cp -r /opt/lammps/examples/indent .

cd indent

# Load OpenMPI module
module load openmpi

# Run simulation
srun apptainer run ../lammps-openmpi.sif -in in.indent

Now this exact same container can be run in both Triton / Puhti that have OpenMPI installed because both clusters use Slurm and InfiniBand interconnects.

To build the image:

$ srun --mem=16G --cpus-per-task=4 --time=01:00:00 apptainer build lammps-openmpi.sif lammps-openmpi.def

To run the example:

$ export PMIX_MCA_gds=hash
$ export UCX_POSIX_USE_PROC_LINK=n
$ export OMPI_MCA_orte_top_session_dir=/tmp/$USER/openmpi
$ sbatch run_lammps_indent.sh
$ tail -n 27 lammps_indent.out
Loop time of 0.752293 on 4 procs for 30000 steps with 420 atoms

Performance: 10336396.152 tau/day, 39878.072 timesteps/s, 16.749 Matom-step/s
99.6% CPU use with 4 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.31927    | 0.37377    | 0.42578    |   7.5 | 49.68
Neigh   | 0.016316   | 0.020162   | 0.023961   |   2.3 |  2.68
Comm    | 0.19882    | 0.25882    | 0.31814    |  10.2 | 34.40
Output  | 0.00033215 | 0.00038609 | 0.00054361 |   0.0 |  0.05
Modify  | 0.044981   | 0.049941   | 0.054024   |   1.7 |  6.64
Other   |            | 0.04921    |            |       |  6.54

Nlocal:            105 ave         112 max          98 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Nghost:           92.5 ave          96 max          89 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Neighs:         892.25 ave        1003 max         788 min
Histogram: 2 0 0 0 0 0 0 0 1 1

Total # of neighbors = 3569
Ave neighs/atom = 8.497619
Neighbor list builds = 634
Dangerous builds = 0
Total wall time: 0:00:01

To build the image:

$ apptainer build lammps-openmpi.sif lammps-openmpi.def

To run the example:

$ export PMIX_MCA_gds=hash
$ sbatch --account=project_XXXXXXX --partition=large run_lammps_indent.sh
$ tail -n 27 lammps_indent.out
Loop time of 0.527178 on 4 procs for 30000 steps with 420 atoms

Performance: 14750222.558 tau/day, 56906.723 timesteps/s, 23.901 Matom-step/s
99.7% CPU use with 4 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.22956    | 0.26147    | 0.28984    |   5.5 | 49.60
Neigh   | 0.012471   | 0.015646   | 0.018613   |   2.3 |  2.97
Comm    | 0.13816    | 0.17729    | 0.2192     |   9.3 | 33.63
Output  | 0.00023943 | 0.00024399 | 0.00025267 |   0.0 |  0.05
Modify  | 0.03212    | 0.035031   | 0.037378   |   1.2 |  6.65
Other   |            | 0.0375     |            |       |  7.11

Nlocal:            105 ave         112 max          98 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Nghost:           92.5 ave          96 max          89 min
Histogram: 1 0 1 0 0 0 0 1 0 1
Neighs:         892.25 ave        1003 max         788 min
Histogram: 2 0 0 0 0 0 0 0 1 1

Total # of neighbors = 3569
Ave neighs/atom = 8.497619
Neighbor list builds = 634
Dangerous builds = 0
Total wall time: 0:00:01

Review of this session

Key points to remember

MPI version should match the version installed to the cluster
Cluster MPI module should be loaded for maximum compatibility with job launching
Care must be taken to make certain that the container utilizes fast interconnects