VASP 4.6 Parallel Hangs at Run Time

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
GeorgetownARC

VASP 4.6 Parallel Hangs at Run Time

#1 Post by GeorgetownARC » Fri Dec 15, 2006 3:20 pm

I have successfully compiled VASP for serial and parallel use. I can run serial jobs without any problem, but parallel jobs launch, then immediately hang. I compiled the parallel version with the following components:

RedHat Linux ELAS 4.0, update 4 (64-bit)
VASP 4.6
Portland pgf90 6.0-8 64-bit
mpich2-1.0.4p1
fftw-3.1.2
GotoBLAS-1.09

I have a job that runs fine using the serial version. When I launch the same job using the parallel version, vasp starts and then hangs (meaning that it doesn't use any memory or CPU). There are no error messages in the output file or on the screen. In fact, there are no messages what so ever. This is making it very hard to debug.

Here is the command that I use to launch MPICH2:

mpdboot -n 6 -f ../mpd.hosts

Here is the command that I use to launch parallel vasp:

mpiexec -machinefile freenodes -n 2 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp < /home/jess/SakuraVASP/POSCAR >/home/jess/SakuraVASP/jess_output

Here is the state of the MPICH2 and vasp programs:

jess 6851 0.0 0.3 87144 7692 ? S 10:14 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6863 0.0 0.3 86156 6848 pts/3 S 10:15 0:00 python2.3 /opt/mpich2/bin/mpiexec -machinefile freenodes -n 2 /home/j
jess 6864 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6865 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6866 0.0 0.0 20504 1016 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp
jess 6867 0.0 0.0 20504 1092 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp

The output file is empty, even after I kill the job:

-rw-r--r-- 1 jess users 0 Dec 15 10:15 jess_output

I have tried recompiling the parallel vasp with debugging options, though I still don't get any messages. Here are the debugging settings that I added to vasp's Makefile:

FFLAGS = -Mfree -tp k8-64 -i8 -C -g

# Under the MPI section
CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 \
-DRPROMU_DGEMV -DRACCMU_DGEMV -Ddebug

Does anyone know of additional debugging/verbose options that I can set so vasp will display any type of message?
Last edited by GeorgetownARC on Fri Dec 15, 2006 3:20 pm, edited 1 time in total.

tjf
Full Member
Full Member
Posts: 107
Joined: Wed Aug 10, 2005 1:30 pm
Location: Leiden, Netherlands

VASP 4.6 Parallel Hangs at Run Time

#2 Post by tjf » Fri Dec 15, 2006 4:15 pm

Firstly, you shouldn't try to stream POSCAR onto stdin. POSCAR, INCAR, etc, are picked up from the execution directory, just like in the serial case. I've no idea how mpich2 handles streamed input (stream handling has been an issue for me with various MPI implementations and codes).

I assume you can run an MPI Hello World?
Last edited by tjf on Fri Dec 15, 2006 4:15 pm, edited 1 time in total.

GeorgetownARC

VASP 4.6 Parallel Hangs at Run Time

#3 Post by GeorgetownARC » Sat Dec 16, 2006 3:07 am

Thank you for this tip. I now get an error message:

[cli_0]: aborting job:
Fatal error in MPI_Cart_sub: Invalid communicator, error stack:
MPI_Cart_sub(198): MPI_Cart_sub(MPI_COMM_NULL, remain_dims=0xa80b90, comm_new=0xcb04e0) failed
MPI_Cart_sub(80).: Null communicator
[cli_1]: aborting job:
Fatal error in MPI_Cart_sub: Invalid communicator, error stack:
MPI_Cart_sub(198): MPI_Cart_sub(MPI_COMM_NULL, remain_dims=0xa80b90, comm_new=0xcb04e0) failed
MPI_Cart_sub(80).: Null communicator

MPICH2 Hello World programs work, but I am double-checking that MPICH2 was compiled for 64-bit, or that it works with 64-bit since it seems that others have had these same errors. The default path for pgf90 points to the 64-bit version, but that doesn't mean that it was compiled correctly for 64-bit.

Thanks for your help. I'll post what I figure out.
Last edited by GeorgetownARC on Sat Dec 16, 2006 3:07 am, edited 1 time in total.

job
Jr. Member
Jr. Member
Posts: 55
Joined: Tue Aug 16, 2005 7:44 am

VASP 4.6 Parallel Hangs at Run Time

#4 Post by job » Mon Jan 22, 2007 7:02 am

You need to remove "-i8" from FFLAGS, since MPI expects 32-bit integers. Also, if you want to use fftw without "-i8" you need to change the code so that the kind of the integers used for storing the fftw plans are large enough, i.e. 64 bits. You can find a patch for that floating around in this forum, courtesy of yours truly.
Last edited by job on Mon Jan 22, 2007 7:02 am, edited 1 time in total.

Post Reply