Tuesday, April 20, 2010

MPI_Init issue

When you get an error like the following this means that MPI_Init is expecting argc/argv. NULL values being passed there are not sufficient. Passing NULL to MPI_Init works on the cluster at ASC though - that is, the program that gives the following error on Olympus runs perfectly on the cluster at ASC.


bash: line 1: 17212 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-2-22.local MPIRUN_PORT=60739 MPIRUN_PROCESSES='compute-2-22:compute-2-22:compute-2-1:compute-2-1:compute-2-23:compute-2-23:compute-2-8:compute-2-8:compute-2-28:compute-2-28:compute-2-3:compute-2-3:compute-2-18:compute-2-18:compute-1-23:compute-1-23:compute-1-14:compute-1-14:compute-1-7:compute-1-7:compute-1-9:compute-1-9:compute-1-1:compute-1-1:compute-1-19:compute-1-19:compute-2-15:compute-2-15:compute-1-17:compute-1-17:compute-1-22:compute-1-22:compute-1-13:compute-1-13:compute-2-17:compute-2-17:compute-1-8:compute-1-8:compute-1-29:compute-1-29:compute-1-25:compute-1-25:compute-1-11:compute-1-11:compute-2-14:compute-2-14:compute-1-30:compute-1-30:compute-1-2:compute-1-2:' MPIRUN_RANK=49 MPIRUN_NPROCS=50 MPIRUN_ID=28494 /home/ritu/testcases/life/parallelgen/life 5000 5000 100000 10

Wednesday, March 24, 2010

Ant Script Problems

When stuck, check:

1) the version of the jar files that are being used. Make sure they are from a stable build. For example, ant-contrib-1.0b has bugs in it.


2) check the classpath. For example, { classname="FileCopy" } is correct but {
classname="C:/AMMA/Eclipse3.3/FileCopy/bin/FileCopy"} is not.











Wednesday, November 26, 2008

Copy & paste the program in the terminal window

Warnings & error messages that I was getting:

mypoisson.cc:1:2: warning: null character(s) ignored
mypoisson.cc:1:3: error: invalid preprocessing directive #i
.
.
.
mypoisson.cc:294:1: warning: null character(s) ignored
mypoisson.cc:294:3: warning: null character(s) ignored
mypoisson.cc:295:1: warning: null character(s) ignored
mypoisson.cc:296:1: warning: null character(s) ignored
mypoisson.cc:297:1: warning: null character(s) ignored
mypoisson.cc:298:1: warning: null character(s) ignored
mypoisson.cc:298:2: warning: no newline at end of file
mypoisson.cc:14: error: 'd' does not name a type
mypoisson.cc:16: error: 'd' does not name a type
mypoisson.cc:18: error: 'd' does not name a type
mypoisson.cc:22: error: 'v' does not name a type
mypoisson.cc:30: error: 'i' does not name a type


Solution:
delete the file and instead of a file transfer through ssh, try doing a copy/paste.

Saturday, November 8, 2008

OpenMPI issues

For decoding the cryptic MPI error messages:

http://www.hpcc.nectec.or.th/wiki/index.php/MPI_Error_messsage

The error message that I am getting if I don't use the "-nolocal" option in the script:

rm_12833: p4_error: semget failed for setnum: 0
p0_14592: (4.875000) net_recv failed for fd = 8
p0_14592: p4_error: net_recv read, errno = : 104
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_14592: (36.906250) net_send: could not write to fd=4, errno = 32

I tried running the following command:
/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs

Got "Permission Denied" error. Just for hit-and-trial sake, I added the "-nolocal" flag and the above mentioned error message gets suppressed. However, I don't see the desired output. The script looks like following with the "-nolocal" option:

#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpi 10
MPI_DIR=/opt/mpich/gnu/bin
EXECUTABLE="/home/ritu/pgaNov/pgaNov"
$MPI_DIR/mpirun -nolocal -np $NSLOTS -machinefile $TMPDIR/machines $EXECUTABLE

Friday, November 7, 2008

To find out the disk usage on Linux

For finding out the total disk usage:

du -sh

For finding out more about the disk usage try out the following command which is helpful in figuring out which directory has the largest file size and then dig deeper in the directory:

du -h --max-depth=1

Other useful command:

cd;du -k sort -nr more

How to remove ^M, the DOS line break character.

To remove the extra ^M character that shows up in files that are transferred to the Unix system from the Windows platform (having ASCII format for the files) :

1) Press Esc and type ":"
2) %s/(ctrl-v)(ctrl-m)//g

The basic search and replace command in unix is %s. The pattern between the first and the second slash (which is ^M) is replaced by the pattern between the second and third slash (which is empty space here). The "g" at the end of the command implies that the global replacement is being done in the file.