For decoding the cryptic MPI error messages:
http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage
The error message that I am getting if I don't use the "-nolocal" option in the script:
rm_12833: p4_error: semget failed for setnum: 0
p0_14592: (4.875000) net_recv failed for fd = 8
p0_14592: p4_error: net_recv read, errno = : 104
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_14592: (36.906250) net_send: could not write to fd=4, errno = 32
I tried running the following command:
/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs
Got "Permission Denied" error. Just for hit-and-trial sake, I added the "-nolocal" flag and the above mentioned error message gets suppressed. However, I don't see the desired output. The script looks like following with the "-nolocal" option:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpi 10
MPI_DIR=/opt/mpich/gnu/bin
EXECUTABLE="/home/ritu/pgaNov/pgaNov"
$MPI_DIR/mpirun -nolocal -np $NSLOTS -machinefile $TMPDIR/machines $EXECUTABLE
Saturday, November 8, 2008
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment