Page 1 of 1

Definition of Error in MLFF

Posted: Thu Sep 07, 2023 1:30 pm
by burakgurlek
Hi,

according to VASP manual the training error in VASP is defined as in the attached image. I wonder what is meant by element-wise over each training structure. Assuming the molecule is composed of Cs and Hs only, the error for the each structure calculated for Cs and Hs separately with the refenrence data and then avaraged to get a number?

Best Regards,
Burak

Re: Definition of Error in MLFF

Posted: Mon Sep 11, 2023 8:25 am
by ferenc_karsai
The equation you wrote is correct.
The information for the energies is correct.
For the forces N goes over atoms and cartesian directions, instead of elements times the number of atoms per element and cartesian directions.
For the stresses N goes over 9 cartesian directons. No elements enter.
For both forces and stresses N goes of course also over the training structures.

Can you please point me to the place in the manual where this is written so I can correct it.

Re: Definition of Error in MLFF

Posted: Thu Sep 14, 2023 12:54 pm
by burakgurlek
Thanks,

This is written here wiki/index.php/Best_practices_for_machi ... rce_fields under the section Monitoring.

One more additinoal question: I created a test configurations and calculate the RMSE error on the forces using the above formula. However, the error written in ML_LOG file via ERR is larger than this. Would you expect that or I am comparing different definitions?

Re: Definition of Error in MLFF

Posted: Fri Sep 15, 2023 8:59 am
by ferenc_karsai
The formula is correct, it's easy to check in the code in the SUMMARY_REPORT subroutine.

Be careful with the source of the forces, since the units on the ML_AB file are in eV/Angstrom. These converted into atomic units in the code. But if you take the forces and energies from the ML_REG file they should fit together and are again in eV/Angstrom.
Just be careful to divide the forces by number of training structures times number of atoms in each training structure times three. If the number of atoms within your training structures change than you have to consider that.

Re: Definition of Error in MLFF

Posted: Sat Sep 16, 2023 10:38 am
by burakgurlek
Thanks, I currently do not have acceess to code as it is ithe cluster.

In my comparison, I got forces via py4vasp as vaspout.hd5 file. I thought the unit of force there is eV/A. I do not use ML_AB or ML_REG file. I took ERR line in ML_LOG file as a reference which gave larger error than the one I calculate via single-point MD calculation via using ML_FF and py4vasp. I make a simple comparison for Naphtalene crystal so the atomic number is constant.

Do you think the source I am using has a problem?

Regards,
Burak

Re: Definition of Error in MLFF

Posted: Mon Sep 18, 2023 12:32 pm
by ferenc_karsai
Just a quick question for clarification, you did do single point calculations for all training structures and not the last? The "ERR" line give the error on all training structures in the ML_AB file, but if you do it only on the last structure it will be of course smaller.

Re: Definition of Error in MLFF

Posted: Tue Sep 19, 2023 2:45 pm
by burakgurlek
I did not do for all training structures, but I have an independent test set and I did for them. So, it is not one to one correspondence, but if you think about the regular ML schema, I do not see why error on training set is larger than an independent test set.

Regards,
Burak

Re: Definition of Error in MLFF

Posted: Thu Sep 21, 2023 12:09 pm
by ferenc_karsai
Ok, thanks for clarification. I just wanted to rule out that you tried to calculate the error on a subset of training structures, because then it would be of course smaller.

So I've checked that the forces on hdf5 file should be in eV/Ang, so no issue there.


A smaller test error compared to training error is most likely due to a too small test set. Imagine you trained a given structure at 300 and 1000 K. If you trained at 300K first and 1000 K you would see that the RMSE on forces of the training structures would significantly go up after adding the high temperature structures, since the higher the temperature is the more variance you get in your system. If you would now pick out a few structures as test structures at 300 K your test error is going to be close to the error when you trained only on 300K.

So you should try to include more structures in your test set, spanning the trained phase space. You should also see that if you include test structures above the trained conditions (like at 1200 K for the upper example) your test error should strongly exceed your training error.

Re: Definition of Error in MLFF

Posted: Wed Sep 27, 2023 9:07 pm
by burakgurlek
Thanks for the response. You have a point but in my case is not that drastically effecting I believe. I trained on 295 K and my test set is from 280, 290, 300, 310. I equally select the number of structures from those. However, the error between test set and training is more like 10meV. I would give a try just to do test set with 295K.

Regards,
Burak