When you get an error like the following this means that MPI_Init is expecting argc/argv. NULL values being passed there are not sufficient. Passing NULL to MPI_Init works on the cluster at ASC though - that is, the program that gives the following error on Olympus runs perfectly on the cluster at ASC.
bash: line 1: 17212 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-2-22.local MPIRUN_PORT=60739 MPIRUN_PROCESSES='compute-2-22:compute-2-22:compute-2-1:compute-2-1:compute-2-23:compute-2-23:compute-2-8:compute-2-8:compute-2-28:compute-2-28:compute-2-3:compute-2-3:compute-2-18:compute-2-18:compute-1-23:compute-1-23:compute-1-14:compute-1-14:compute-1-7:compute-1-7:compute-1-9:compute-1-9:compute-1-1:compute-1-1:compute-1-19:compute-1-19:compute-2-15:compute-2-15:compute-1-17:compute-1-17:compute-1-22:compute-1-22:compute-1-13:compute-1-13:compute-2-17:compute-2-17:compute-1-8:compute-1-8:compute-1-29:compute-1-29:compute-1-25:compute-1-25:compute-1-11:compute-1-11:compute-2-14:compute-2-14:compute-1-30:compute-1-30:compute-1-2:compute-1-2:' MPIRUN_RANK=49 MPIRUN_NPROCS=50 MPIRUN_ID=28494 /home/ritu/testcases/life/parallelgen/life 5000 5000 100000 10
Tuesday, April 20, 2010
Wednesday, March 24, 2010
Ant Script Problems
When stuck, check:
1) the version of the jar files that are being used. Make sure they are from a stable build. For example, ant-contrib-1.0b has bugs in it.
2) check the classpath. For example, { classname="FileCopy" } is correct but {
classname="C:/AMMA/Eclipse3.3/FileCopy/bin/FileCopy"} is not.
1) the version of the jar files that are being used. Make sure they are from a stable build. For example, ant-contrib-1.0b has bugs in it.
2) check the classpath. For example, { classname="FileCopy" } is correct but {
classname="C:/AMMA/Eclipse3.3/FileCopy/bin/FileCopy"} is not.
Wednesday, November 26, 2008
Copy & paste the program in the terminal window
Warnings & error messages that I was getting:
mypoisson.cc:1:2: warning: null character(s) ignored
mypoisson.cc:1:3: error: invalid preprocessing directive #i
.
.
.
mypoisson.cc:294:1: warning: null character(s) ignored
mypoisson.cc:294:3: warning: null character(s) ignored
mypoisson.cc:295:1: warning: null character(s) ignored
mypoisson.cc:296:1: warning: null character(s) ignored
mypoisson.cc:297:1: warning: null character(s) ignored
mypoisson.cc:298:1: warning: null character(s) ignored
mypoisson.cc:298:2: warning: no newline at end of file
mypoisson.cc:14: error: 'd' does not name a type
mypoisson.cc:16: error: 'd' does not name a type
mypoisson.cc:18: error: 'd' does not name a type
mypoisson.cc:22: error: 'v' does not name a type
mypoisson.cc:30: error: 'i' does not name a type
Solution:
delete the file and instead of a file transfer through ssh, try doing a copy/paste.
mypoisson.cc:1:2: warning: null character(s) ignored
mypoisson.cc:1:3: error: invalid preprocessing directive #i
.
.
.
mypoisson.cc:294:1: warning: null character(s) ignored
mypoisson.cc:294:3: warning: null character(s) ignored
mypoisson.cc:295:1: warning: null character(s) ignored
mypoisson.cc:296:1: warning: null character(s) ignored
mypoisson.cc:297:1: warning: null character(s) ignored
mypoisson.cc:298:1: warning: null character(s) ignored
mypoisson.cc:298:2: warning: no newline at end of file
mypoisson.cc:14: error: 'd' does not name a type
mypoisson.cc:16: error: 'd' does not name a type
mypoisson.cc:18: error: 'd' does not name a type
mypoisson.cc:22: error: 'v' does not name a type
mypoisson.cc:30: error: 'i' does not name a type
Solution:
delete the file and instead of a file transfer through ssh, try doing a copy/paste.
Saturday, November 8, 2008
OpenMPI issues
For decoding the cryptic MPI error messages:
http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage
The error message that I am getting if I don't use the "-nolocal" option in the script:
rm_12833: p4_error: semget failed for setnum: 0
p0_14592: (4.875000) net_recv failed for fd = 8
p0_14592: p4_error: net_recv read, errno = : 104
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_14592: (36.906250) net_send: could not write to fd=4, errno = 32
I tried running the following command:
/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs
Got "Permission Denied" error. Just for hit-and-trial sake, I added the "-nolocal" flag and the above mentioned error message gets suppressed. However, I don't see the desired output. The script looks like following with the "-nolocal" option:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpi 10
MPI_DIR=/opt/mpich/gnu/bin
EXECUTABLE="/home/ritu/pgaNov/pgaNov"
$MPI_DIR/mpirun -nolocal -np $NSLOTS -machinefile $TMPDIR/machines $EXECUTABLE
http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage
The error message that I am getting if I don't use the "-nolocal" option in the script:
rm_12833: p4_error: semget failed for setnum: 0
p0_14592: (4.875000) net_recv failed for fd = 8
p0_14592: p4_error: net_recv read, errno = : 104
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_14592: (36.906250) net_send: could not write to fd=4, errno = 32
I tried running the following command:
/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs
Got "Permission Denied" error. Just for hit-and-trial sake, I added the "-nolocal" flag and the above mentioned error message gets suppressed. However, I don't see the desired output. The script looks like following with the "-nolocal" option:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpi 10
MPI_DIR=/opt/mpich/gnu/bin
EXECUTABLE="/home/ritu/pgaNov/pgaNov"
$MPI_DIR/mpirun -nolocal -np $NSLOTS -machinefile $TMPDIR/machines $EXECUTABLE
Friday, November 7, 2008
To find out the disk usage on Linux
For finding out the total disk usage:
du -sh
For finding out more about the disk usage try out the following command which is helpful in figuring out which directory has the largest file size and then dig deeper in the directory:
du -h --max-depth=1
Other useful command:
cd;du -k sort -nr more
du -sh
For finding out more about the disk usage try out the following command which is helpful in figuring out which directory has the largest file size and then dig deeper in the directory:
du -h --max-depth=1
Other useful command:
cd;du -k sort -nr more
How to remove ^M, the DOS line break character.
To remove the extra ^M character that shows up in files that are transferred to the Unix system from the Windows platform (having ASCII format for the files) :
1) Press Esc and type ":"
2) %s/(ctrl-v)(ctrl-m)//g
The basic search and replace command in unix is %s. The pattern between the first and the second slash (which is ^M) is replaced by the pattern between the second and third slash (which is empty space here). The "g" at the end of the command implies that the global replacement is being done in the file.
1) Press Esc and type ":"
2) %s/(ctrl-v)(ctrl-m)//g
The basic search and replace command in unix is %s. The pattern between the first and the second slash (which is ^M) is replaced by the pattern between the second and third slash (which is empty space here). The "g" at the end of the command implies that the global replacement is being done in the file.
Subscribe to:
Posts (Atom)
