Very slow parallel version on multiple nodes

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
fanghz
Newbie
Newbie
Posts: 28
Joined: Fri May 11, 2007 1:47 am

Very slow parallel version on multiple nodes

#1 Post by fanghz » Tue Jan 20, 2009 3:14 am

Dear Admin,

I have successfully complied both serial and parallel vasp.4.6.35 versions in my cluster. The serial version works well, but when I run the parallel version, I encounter a serious problem. The parallel version works well on one node (8 cores inside), while the running speed slows down when using on two nodes (16 cores). It makes me crazy that the speed becomes slower with the increase of the computing nodes.

------------------------------------------------------------------------------------------

The hardware and software configurations one each node are listed below:

Intel Xeon E5420 2.5G CPU (2*4 cores), 4G Memory,146G Diskspace, 1G Networkcard and 1G Exchanger

Suse Linux 10.0
Intel fortran compiler 10.1.021
Lam-mpi 7.1.4

BLAS: Supplied by Intel MKL 10.1.0.015
(setting in makefile: BLAS=-L/opt/intel/mkl/10.1.0.015/lib/32 -lmkl -lmkl_blacs -lmkl_core -lmkl_intel_thread -lsvml -liomp5

-lguide -lpthread)

LAPACK: Supplied by Intel MKL 10.1.0.015
(setting in makefile: LAPACK=-L/opt/intel/mkl/10.1.0.015/lib/32/libmkl_lapack.a)

------------------------------------------------------------------------------------------

For bench.Hg, the running time using one node and two nodes are listed below:

one node (8 cores):

Total CPU time used (sec): 19.897
User time (sec): 19.717
System time (sec): 0.180
Elapsed time (sec): 19.916

Two nodes (16cores):

Total CPU time used (sec): 15.069
User time (sec): 11.309
System time (sec): 3.760
Elapsed time (sec): 91.696

It is obvious that the total cpu time is decreased, but the elapsed time is largely increased. I have check the occupied ratio of each CPU: when running on one node, the value is almost 100% ; while on two nodes, the value is less than 20%.

------------------------------------------------------------------------------------------


This seems to be a vasp-related problem ? I have tested the simple parallel examples supplied by the lam-mpi, there is no problem at all, and they run faster when I increase the computing nodes.

Can anyone give a solution? It is very important to me! Thanks greatly in advance. I can supply more detailed information if required.
Last edited by fanghz on Tue Jan 20, 2009 3:14 am, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

Very slow parallel version on multiple nodes

#2 Post by admin » Fri Jan 23, 2009 10:28 am

as you see, the difference is almost entirely up to the system time, which is not related to vasp, but to your hardware / mpi .
Last edited by admin on Fri Jan 23, 2009 10:28 am, edited 1 time in total.

jagladden
Newbie
Newbie
Posts: 2
Joined: Tue Jan 27, 2009 4:09 am
License Nr.: 379 (Prezhdo group)

Very slow parallel version on multiple nodes

#3 Post by jagladden » Tue Jan 27, 2009 5:02 am

Fanghz,

I am curious as to what kind of scaling you are seeing within the confines of a single eight core node. I have recently been benchmarking a Dell 1950 with similar CPUs (L5410s). In my case Vasp is also compiled with the Intel compilers and MKL, but also with the Intel MPI.

For Hg.bench I get results roughly like this:

1 core = 45 Secs
2 core = 29 Secs
4 core = 23 Secs
8 core = 21 Secs

These are wall clock times. As you can see there is very little improvement when going from 4 to 8 cores on the same box.

Is this consistent with your results?

Jim
Last edited by jagladden on Tue Jan 27, 2009 5:02 am, edited 1 time in total.

Post Reply