Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
dominika_melicherova1
Newbie
Newbie
Posts: 5
Joined: Mon Feb 28, 2022 9:07 am

Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#1 Post by dominika_melicherova1 » Sat Apr 16, 2022 9:03 am

Dear vasp developers,

I want to run MD simulation with switched on MLFF training but I got segmentation fault error.
I tried to set ulimit -s unlimited but it didn't help.

All files related to the simulation are attached.

Thank you
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#2 Post by ferenc_karsai » Tue Apr 19, 2022 7:24 am

It looks like your scalapack has problems setting up the processor grid. I suspect some problems with scalapack.
Please try compiling without "-DscaLAPACK" and run the code. Most likely you will run out of memory, so run it for something small just for test purposes (let's say 8-16 atoms). If that runs properly please try to switch back scalapack and if then the error comes again we have pinned it down to a faulty scalapack. Also do your tests with a lesser number of cores. Please also test running on one node and on more.

Which toolchains are you using?

dominika_melicherova1
Newbie
Newbie
Posts: 5
Joined: Mon Feb 28, 2022 9:07 am

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#3 Post by dominika_melicherova1 » Tue Apr 19, 2022 5:25 pm

Thank you for your reply,

it seems that scalapack is one part of the problem. I ran a simulation with small cell (8 atoms) using VASP without "-DscaLAPACK" and it was working but it didn't work using VASP with "-DscaLAPACK". Therefore, I ran the simulation with large (144 atoms) supercell without scalapack but there is some other problem. File with error output is attached.

I am using Intel-2021.4.0 and OpenMPI-4.1.2.

Thank you
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#4 Post by ferenc_karsai » Thu Apr 21, 2022 7:45 am

The large calculation most likely went out of memory. In the ML_LOGFILE the memory prediction can be seen per core. In your case it writes:
"Total memory consumption : 16056.6". I guess you don't have 16GB per core available.

Practically the use of scalapack is required to run the code for realistic systems (at least the learning part), because the design matrix needs to be distributed. Without scalapack each core possesses the whole design matrix. With scalapack the distribution of this array is almost perfectly linear with the number of cores. We made the code available to use without scalapack to pin down scalapack errors like in your case.

So this means you need to fix your scalapack installation to be able to run the ML code on your system.

dominika_melicherova1
Newbie
Newbie
Posts: 5
Joined: Mon Feb 28, 2022 9:07 am

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#5 Post by dominika_melicherova1 » Thu Apr 21, 2022 8:19 am

Thank you for your help. I have already solved the problem with scalapack and it works very well now. I had only a trivial mistake in makefile related to linking libraries.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

#6 Post by ferenc_karsai » Fri Apr 22, 2022 4:47 am

Thank you for your reply, I am very glad that it works now.
I am going to close this topic now.

Locked