Parallelization of MD Simulation
Moderators: Global Moderator, Moderator
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Parallelization of MD Simulation
Hi,
I am wondering is there an efficient way to parallelize MD simulations (with ML force field)? or VASP is capable of parallelizing the dynamics simulations?
Regards,
Burak
I am wondering is there an efficient way to parallelize MD simulations (with ML force field)? or VASP is capable of parallelizing the dynamics simulations?
Regards,
Burak
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Parallelization of MD Simulation
I don't fully understand what you mean with the question, but I try to answer.
The ions are never parallelized over processors in VASP since it is calculationally inexpensive compared to MLFF or MD up to approximately 100000 atoms.
If you run a fully ab-initio MD then the electronic part is parallelized like in usual single point calculations (first kpoints via KPAR then bands and coefficients via NCORE). Here you should set KPAR (wiki/index.php/KPAR) and NCORE (wiki/index.php/NCORE).
If pure machine learned force fields are run then there is a total different parallelization is used and you does not have to set anything extra (except the number of cores when you run the code from the command line).
If on-the-fly learning is executed than both are run simultaneously and you only have to set the ab-initio part.
The ions are never parallelized over processors in VASP since it is calculationally inexpensive compared to MLFF or MD up to approximately 100000 atoms.
If you run a fully ab-initio MD then the electronic part is parallelized like in usual single point calculations (first kpoints via KPAR then bands and coefficients via NCORE). Here you should set KPAR (wiki/index.php/KPAR) and NCORE (wiki/index.php/NCORE).
If pure machine learned force fields are run then there is a total different parallelization is used and you does not have to set anything extra (except the number of cores when you run the code from the command line).
If on-the-fly learning is executed than both are run simultaneously and you only have to set the ab-initio part.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Parallelization of MD Simulation
Thanks for the reply. I meant pure machine learned force filed run. In this case, I do see effect of parallelization up to 30 cores but then the speed saturates. Even if I add more nodes the scaling does not change. Do you expect such scaling? I have more than 15000 atoms.
Regards,
Burak
Regards,
Burak
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Parallelization of MD Simulation
Strong scaling is not always easy to reach, but a saturation with 30 cores seems a little bit low. However the scaling depends on various things like architecture, number of local reference configurations, number of atoms per type, number of atoms altogether etc.
So could you please upload your ML_AB, POSCAR, POTCAR, KPOINTS, OUTCAR and ML_LOGFILE. I will create an ML_FF file myself and run the calculation on our computers.
So could you please upload your ML_AB, POSCAR, POTCAR, KPOINTS, OUTCAR and ML_LOGFILE. I will create an ML_FF file myself and run the calculation on our computers.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Parallelization of MD Simulation
Sorry for the late response here is the files:
https://www.dropbox.com/scl/fi/pawrq3l1 ... 04yy9&dl=0
https://www.dropbox.com/scl/fi/pawrq3l1 ... 04yy9&dl=0
-
- Hero Member
- Posts: 585
- Joined: Tue Nov 16, 2004 2:21 pm
- License Nr.: 5-67
- Location: Germany
Re: Parallelization of MD Simulation
Hi Burak,
your POSCAR says 144 atoms, hence the quick saturation with pure ML_FF.
Cheers,
alex
your POSCAR says 144 atoms, hence the quick saturation with pure ML_FF.
Cheers,
alex
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: Parallelization of MD Simulation
I can't open your dropbox, the files are already deleted.
Here are some remarks which can completely kill your strong scaling:
1) Output frequency of the calculation: Already for medium to big systems, the times for force-fields predictions are on the same order of magnitude as the writing of positions/forces etc. to the files. So for production runs the output frequency has to be lowered using ML_OUTBLOCK (https://www.vasp.at/wiki/index.php/ML_OUTBLOCK). For smaller systems it can happen that 99% of the time measured is the IO. In this case one will see no strong scaling.
2) Number of atoms: As Alex (thank you for that) pointed out, with a small number of atoms the calculations are so fast that the calculation will be dominated by communication.
3) Number of local reference configurations: Same as with the number of atoms, but even more pronounced for small number of local reference configurations. Additionally for multi-element systems, if the number of local reference are very different for each type (some very large and some very small), the strong scaling can be additionally worsened.
4) Architecture: There can be many factors which can strongly influence the strong scaling and I can't list them all. I will mention only two cases which can really destroy strong scaling. First if you run on nodes with multiple sockets, try to pin your cores only to one socket. The cross memory access for multiple sockets almost negates all speedup that you get from increasing the number of cores. Second for multi-node calculations you must have infiniband or something similar fast.
I've tested the strong scaling on one of our machines that has 128 nodes and two sockets on two medium large systems (2000 atoms and around 1000-2000 local reference configurations).
Up to 64 cores the strong scaling was really fine (almost linear) if I pinned all cores to one socket. If I didn't pin the cores the strong scaling was much worse than linear scaling with the number of cpus.
I also tested with 128 nodes but as expected scaling was not linear due to cross communication of sockets.
Here are some remarks which can completely kill your strong scaling:
1) Output frequency of the calculation: Already for medium to big systems, the times for force-fields predictions are on the same order of magnitude as the writing of positions/forces etc. to the files. So for production runs the output frequency has to be lowered using ML_OUTBLOCK (https://www.vasp.at/wiki/index.php/ML_OUTBLOCK). For smaller systems it can happen that 99% of the time measured is the IO. In this case one will see no strong scaling.
2) Number of atoms: As Alex (thank you for that) pointed out, with a small number of atoms the calculations are so fast that the calculation will be dominated by communication.
3) Number of local reference configurations: Same as with the number of atoms, but even more pronounced for small number of local reference configurations. Additionally for multi-element systems, if the number of local reference are very different for each type (some very large and some very small), the strong scaling can be additionally worsened.
4) Architecture: There can be many factors which can strongly influence the strong scaling and I can't list them all. I will mention only two cases which can really destroy strong scaling. First if you run on nodes with multiple sockets, try to pin your cores only to one socket. The cross memory access for multiple sockets almost negates all speedup that you get from increasing the number of cores. Second for multi-node calculations you must have infiniband or something similar fast.
I've tested the strong scaling on one of our machines that has 128 nodes and two sockets on two medium large systems (2000 atoms and around 1000-2000 local reference configurations).
Up to 64 cores the strong scaling was really fine (almost linear) if I pinned all cores to one socket. If I didn't pin the cores the strong scaling was much worse than linear scaling with the number of cpus.
I also tested with 128 nodes but as expected scaling was not linear due to cross communication of sockets.
-
- Jr. Member
- Posts: 51
- Joined: Thu Apr 06, 2023 12:25 pm
Re: Parallelization of MD Simulation
Dear Ferenc and Alex,
thanks for your valuable insights. I believe in my case the number of atoms is the problem.
thanks for your valuable insights. I believe in my case the number of atoms is the problem.