scaLAPACK runtime error on Cray XT/CNL
Posted: Tue Feb 03, 2009 5:00 pm
Hello. We have a user that was complaining about VASP performance on our Cray XT4/CNL. So I noticed that the Makefile didn't include -DscaLAPACK. After a fresh build (4.6.34), the user finds VASP dies immediately (output below) after requesting ridiculous amounts of memory. Any suggestions are appreciated. Thank you.
Chris
> aprun -n 64 /work/cots/src-vasp.4.6.34-mpi/vasp
running on 64 nodes
distr: one band on 32 nodes, 2 groups
vasp.4.6.34 5Dec07 complex
POSCAR found : 1 types and 729 ions
scaLAPACK will be used
LDA part: xc-table for Ceperly-Alder, standard interpolation
-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| VASP found 2184 degrees of freedom |
| the temperature will equal 2*E(kin)/ (degrees of freedom) |
| this differs from previous releases, where T was 2*E(kin)/(3 NIONS). |
| The new definition is more consistent |
| |
-----------------------------------------------------------------------------
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ... 3
reading WAVECAR
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
prediction of wavefunctions initialized - no I/O
entering main loop
N E dE d eps ncg rms rms(c)
0: ALLOCATE: 18446744071837617344 bytes requested; not enough memory
0: ALLOCATE: 18446744071837737152 bytes requested; not enough memory
0: ALLOCATE: 18446744071837884608 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837580480 bytes requested; not enough memory
0: ALLOCATE: 18446744071837460672 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837465280 bytes requested; not enough memory
0: ALLOCATE: 18446744071837603520 bytes requested; not enough memory
0: ALLOCATE: 18446744071837880000 bytes requested; not enough memory
0: ALLOCATE: 18446744071837732544 bytes requested; not enough memory
0: ALLOCATE: 18446744071837755584 bytes requested; not enough memory
0: ALLOCATE: 18446744071837709504 bytes requested; not enough memory
0: ALLOCATE: 18446744071837442240 bytes requested; not enough memory
0: ALLOCATE: 18446744071837488320 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837585088 bytes requested; not enough memory
0: ALLOCATE: 18446744071837589696 bytes requested; not enough memory
0: ALLOCATE: 18446744071837741760 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837820096 bytes requested; not enough memory
0: ALLOCATE: 18446744071837654208 bytes requested; not enough memory
0: ALLOCATE: 18446744071837672640 bytes requested; not enough memory
0: ALLOCATE: 18446744071837668032 bytes requested; not enough memory
0: ALLOCATE: 18446744071837783232 bytes requested; not enough memory
0: ALLOCATE: 18446744071837893824 bytes requested; not enough memory
0: ALLOCATE: 18446744071837460672 bytes requested; not enough memory
0: ALLOCATE: 18446744071837880000 bytes requested; not enough memory
0: ALLOCATE: 18446744071837847744 bytes requested; not enough memory
0: ALLOCATE: 18446744071837815488 bytes requested; not enough memory
0: ALLOCATE: 18446744071837446848 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837617344 bytes requested; not enough memory
0: ALLOCATE: 18446744071837875392 bytes requested; not enough memory
0: ALLOCATE: 18446744071837474496 bytes requested; not enough memory
0: ALLOCATE: 18446744071837866176 bytes requested; not enough memory
0: ALLOCATE: 18446744071837442240 bytes requested; not enough memory
0: ALLOCATE: 18446744071837801664 bytes requested; not enough memory
0: ALLOCATE: 18446744071837709504 bytes requested; not enough memory
0: ALLOCATE: 18446744071837359296 bytes requested; not enough memory
0: ALLOCATE: 18446744071837714112 bytes requested; not enough memory
0: ALLOCATE: 18446744071837497536 bytes requested; not enough memory
0: ALLOCATE: 18446744071837340864 bytes requested; not enough memory
0: ALLOCATE: 18446744071837382336 bytes requested; not enough memory
0: ALLOCATE: 18446744071837861568 bytes requested; not enough memory
0: ALLOCATE: 18446744071837700288 bytes requested; not enough memory
0: ALLOCATE: 18446744071837433024 bytes requested; not enough memory
[NID 2369]Apid 533807: initiated application termination
Application 533807 exit codes: 127
Application 533807 exit signals: Killed
Application 533807 resources: utime 76710, stime 4370
<span class='smallblacktext'>[ Edited ]</span>
Chris
> aprun -n 64 /work/cots/src-vasp.4.6.34-mpi/vasp
running on 64 nodes
distr: one band on 32 nodes, 2 groups
vasp.4.6.34 5Dec07 complex
POSCAR found : 1 types and 729 ions
scaLAPACK will be used
LDA part: xc-table for Ceperly-Alder, standard interpolation
-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| VASP found 2184 degrees of freedom |
| the temperature will equal 2*E(kin)/ (degrees of freedom) |
| this differs from previous releases, where T was 2*E(kin)/(3 NIONS). |
| The new definition is more consistent |
| |
-----------------------------------------------------------------------------
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ... 3
reading WAVECAR
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
prediction of wavefunctions initialized - no I/O
entering main loop
N E dE d eps ncg rms rms(c)
0: ALLOCATE: 18446744071837617344 bytes requested; not enough memory
0: ALLOCATE: 18446744071837737152 bytes requested; not enough memory
0: ALLOCATE: 18446744071837884608 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837580480 bytes requested; not enough memory
0: ALLOCATE: 18446744071837460672 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837465280 bytes requested; not enough memory
0: ALLOCATE: 18446744071837603520 bytes requested; not enough memory
0: ALLOCATE: 18446744071837880000 bytes requested; not enough memory
0: ALLOCATE: 18446744071837732544 bytes requested; not enough memory
0: ALLOCATE: 18446744071837755584 bytes requested; not enough memory
0: ALLOCATE: 18446744071837709504 bytes requested; not enough memory
0: ALLOCATE: 18446744071837442240 bytes requested; not enough memory
0: ALLOCATE: 18446744071837488320 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837585088 bytes requested; not enough memory
0: ALLOCATE: 18446744071837589696 bytes requested; not enough memory
0: ALLOCATE: 18446744071837741760 bytes requested; not enough memory
0: ALLOCATE: 18446744071837520576 bytes requested; not enough memory
0: ALLOCATE: 18446744071837820096 bytes requested; not enough memory
0: ALLOCATE: 18446744071837654208 bytes requested; not enough memory
0: ALLOCATE: 18446744071837672640 bytes requested; not enough memory
0: ALLOCATE: 18446744071837668032 bytes requested; not enough memory
0: ALLOCATE: 18446744071837783232 bytes requested; not enough memory
0: ALLOCATE: 18446744071837893824 bytes requested; not enough memory
0: ALLOCATE: 18446744071837460672 bytes requested; not enough memory
0: ALLOCATE: 18446744071837880000 bytes requested; not enough memory
0: ALLOCATE: 18446744071837847744 bytes requested; not enough memory
0: ALLOCATE: 18446744071837815488 bytes requested; not enough memory
0: ALLOCATE: 18446744071837446848 bytes requested; not enough memory
0: ALLOCATE: 18446744071837539008 bytes requested; not enough memory
0: ALLOCATE: 18446744071837617344 bytes requested; not enough memory
0: ALLOCATE: 18446744071837875392 bytes requested; not enough memory
0: ALLOCATE: 18446744071837474496 bytes requested; not enough memory
0: ALLOCATE: 18446744071837866176 bytes requested; not enough memory
0: ALLOCATE: 18446744071837442240 bytes requested; not enough memory
0: ALLOCATE: 18446744071837801664 bytes requested; not enough memory
0: ALLOCATE: 18446744071837709504 bytes requested; not enough memory
0: ALLOCATE: 18446744071837359296 bytes requested; not enough memory
0: ALLOCATE: 18446744071837714112 bytes requested; not enough memory
0: ALLOCATE: 18446744071837497536 bytes requested; not enough memory
0: ALLOCATE: 18446744071837340864 bytes requested; not enough memory
0: ALLOCATE: 18446744071837382336 bytes requested; not enough memory
0: ALLOCATE: 18446744071837861568 bytes requested; not enough memory
0: ALLOCATE: 18446744071837700288 bytes requested; not enough memory
0: ALLOCATE: 18446744071837433024 bytes requested; not enough memory
[NID 2369]Apid 533807: initiated application termination
Application 533807 exit codes: 127
Application 533807 exit signals: Killed
Application 533807 resources: utime 76710, stime 4370
<span class='smallblacktext'>[ Edited ]</span>