MPI Troubleshooting

This document describes techniques to test your network configuration in case of MPI problems.

 

If MPI accelerated simulations with CST STUDIO SUITE do not work as expected, the reason might be an erroneous configuration of the network. If you encounter problems, please work carefully through this troubleshooting guide.

Table of Contents

Test Network Configuration

Test DNS Configuration

Test File Access

Test Connection to MPI Daemon

Test Connection Between MPI Daemons

Select Interconnect Used for MPI Communication

Enable MPI Debug Output (Windows only)

See also

Test Network Configuration

MPI Client Nodes need to "see" each other on the network. To check whether this is the case, please execute the command

 

ping <nodeB>

 

on <nodeA> and the command

 

ping <nodeA>

 

on <nodeB>.

You should see a reply of the remote node and no error message in case both hosts can communicate over the network.

 

Table of Contents

Test DNS Configuration

Make sure DNS is correctly configured unless you are using a static name resolution mechanism via a host file. DNS is correctly configured if you can resolve hostnames to IP addresses.

To check whether this is the case please execute the command:

 

nslookup <nodeA>

 

nslookup will tell you which DNS server it is using and to which IP the hostname <nodeA> is resolved. Check that you can do a reverse lookup on this IP, i.e. execute the command

 

nslookup <IP-you-got>

 

Try this for both nodes on both nodes.

 

Table of Contents

Test File Access

Make sure that the CST STUDIO SUITE installation folders can be accessed on all nodes and that the user account you use for your MPI simulations has the permission to execute programs located in these folders. To check this, execute the following program on each of the nodes:

 

Windows:    <CST_DIR>\CSTMPIClusterInfo.exe

Linux:      <CST_DIR>/Linux32/CSTMPIClusterInfo

 

It should tell you the version of your installation of CST STUDIO SUITE.

 

Table of Contents

Test Connection to MPI Daemon

Windows

Next step is to check if the MPI daemon can start processes on each of the nodes. Login to the first node and try to execute some program on the second node. For example, execute the following command in your CST STUDIO SUITE installation folder:

 

Windows:    mpiexec -n 1 -host <nodeB> hostname

 

which should give you the hostname of <nodeB>.

If that works, try the same with the CSTMPIClusterInfo.exe:

 

Windows:    mpiexec -n 1 -host <nodeB> <CST_DIR>\CSTMPIClusterInfo.exe

 

Linux

SSH connection

Check that you can login from the head node to all other nodes without a password. So execute

 

ssh <nodeB> hostname

 

Which should give you the hostname of <nodeB> without asking for a password or anything else.

 

Test mpirun

Next step is to check if the MPI daemon can start processes on each of the nodes. Login to the first node and try to execute some program on the second node. To execute a program remotely on e.g. <nodeB>, first create a file named ’mpd.hosts’. You need to specify the location of this file on the command line later on. Create the file ’mpd.hosts’ which contains just a single line with the hostname that you want to run a program on, in this case <nodeB>. For example, execute the following command in the Linux32 subfolder of your CST STUDIO SUITE installation:

 

./mpirun -r ssh -f /your/path/mpd.hosts -n 1 -host <nodeB> hostname

 

which should give you the hostname of <nodeB>. If that works, try the same with the CSTMPIClusterInfo program:

 

./mpirun -r ssh -f /your/path/mpd.hosts -n 1 -host <nodeB> <CST_DIR>/Linux32/CSTMPIClusterInfo

 

Table of Contents

Test Connection Between MPI Daemons

Next, try to execute the CST MPI Performance Test by running it from the installation directory. On Linux, first create a file ’mpd.hosts’ which contains two lines with the hostname of the two nodes where you want to run the test on.

 

MS Windows (32 Bit):  

mpiexec -hosts 2 <nodeA> <nodeB> <CST_DIR>\CSTMPIPerformanceTest.exe

MS Windows (64 Bit):  

mpiexec -hosts 2 <nodeA> <nodeB> <CST_DIR>\AMD64\CSTMPIPerformanceTest_AMD64.exe

Linux (32 Bit):

mpirun -r ssh -f /your/path/mpd.hosts

-n 1 -host <nodeA> <CST_DIR>/Linux32/CSTMPIPerformanceTest :

-n 1 -host <nodeB> <CST_DIR>/Linux32/CSTMPIPerformanceTest

Linux (64 Bit):

mpirun -r ssh -f /your/path/mpd.hosts

-n 1 -host <nodeA> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64 :

-n 1 -host <nodeB> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64

 

This tool first tries to establish an MPI connection between the two nodes. In case of success, it measures the latency and bandwidth of your interconnect.

If this tool works, MPI enabled simulations with CST STUDIO SUITE should also work.

 

Table of Contents

Select Interconnect Used for MPI Communication

The best interconnect available is automatically chosen. However, for troubleshooting it might be useful to manually select the interconnect used by MPI. This can be done using the environment variable I_MPI_FABRICS.

 

To select TCP/IP (Ethernet) as interconnect, set I_MPI_FABRICS to "tcp":

 

MS Windows (64 Bit):  

set I_MPI_FABRICS=tcp

mpiexec -hosts 2 <nodeA> <nodeB> <CST_DIR>\AMD64\CSTMPIPerformanceTest_AMD64.exe

 

Linux (64 Bit):

export I_MPI_FABRICS=tcp

mpirun -r ssh -f /your/path/mpd.hosts

-n 1 -host <nodeA> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64 :

-n 1 -host <nodeB> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64

 

To select InfiniBand as interconnect, set I_MPI_FABRICS to "dapl" (Windows) or "ofa" (Linux):

 

MS Windows (64 Bit):  

set I_MPI_FABRICS=dapl

mpiexec -hosts 2 <nodeA> <nodeB> <CST_DIR>\AMD64\CSTMPIPerformanceTest_AMD64.exe

 

Linux (64 Bit):

export I_MPI_FABRICS=ofa

mpirun -r ssh -f /your/path/mpd.hosts

-n 1 -host <nodeA> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64 :

-n 1 -host <nodeB> <CST_DIR>/LinuxAMD64/CSTMPIPerformanceTest_AMD64

 

Table of Contents

Enable MPI Debug Output (Windows only)

 

If the simulation cannot be started even though the CST MPI Performance Test works, try to enable debugging of the smpd. To do so, log in to one of your compute nodes, go to the installation folder of CST MPI Service and tell the smpd service to produce logging output:

 

smpd -traceon <logfilename>

 

Then try to start an MPI enabled simulation on that node. After running the simulation, disable logging again by executing:

 

smpd -traceoff

 

Then have a look at <logfilename> and/or send it to CST support.

 

Table of Contents

 

See Also

MPI Simulation Overview, MPI Cluster Dialog, MPI Installation