Page 1 of 1

Vasp Performance

Posted: Sat Nov 03, 2007 6:19 pm
by sangamesh
Hi All,

I'm doing the benchmarking of VASP on a cluster of three nodes having AMD64, dual core dual processor, 2.618 with 8GB of RAM.

Vasp is already executed on an Intel high end machine using Intel compiler 9 and GOTO Blas library.

Now I've to position AMD64 system + Pathscale compiler + ATLAS library with the above intel system. Right now only serial is tried.

In both of the machines same input file is used with number of iterations 30 (NSW=30). Intel executes it 1 hour but amd is taking three hours. Why is this? Any other BLAS library?

The OUTCAR files in both of the machines are not exactly identical, but with slight difference.

I don't know the science behind VASP. So I do not understand the processing and content of output.

Anybody used pathscale compiler on AMD, for running VASP?
How was the performance?
How to optimize it?

Please help me out in optimizing performance on AMD with pathscale and ATLAS.



regards,
Sangamesh
Engineer - HPTC

Vasp Performance

Posted: Tue Nov 20, 2007 12:38 pm
by job
Current Intel processors (Core2 generation) can execute up to 4 DP FLOPS per clock cycle when running vectorized code, whereas current Opterons (K8) do only up to 2 DP FLOPS/cycle whether the code is vectorized or not. For applications that vectorize well (e.g. by spending a large fraction of their runtime in optimized BLAS libraries) this can make a big difference.

In my experience GOTO BLAS is the fastest option for the Opteron (the Opteron optimized version of course, not the same that you're using for the Intel computer!). You can also try and see if using FFTW makes a difference.