Page 1 of 1

it become slower train when copying ML_ABN to ML_AB to continue to train

Posted: Sat Mar 29, 2025 9:53 am
by suojiang_zhang1

Dear,
For the same computer to run the MLFF train, I found the training speed will be slower when I copied the ML_ABN from the first time to ML_AB to continue train,


Re: it become slower train when copying ML_ABN to ML_AB to continue to train

Posted: Mon Mar 31, 2025 10:08 am
by marie-therese.huebsch

Hi,

Great that you do some testing. Could you clarify what exactly you are observing?

For reference, the ab-initio calculation should remain at the same computational cost in any MD step unless you changed some settings. During training more and more local reference configurations are collected and that will indeed cost more computational effort to add the e.g. 15th compared to the 4th local reference configuration and apply the design matrix. However it is not an option to entirely avoid adding local reference configurations, since this is what improves the force field. A comparison of restarting a training calculation or running a training calculation for longer should not impact the computational cost significantly (minus the overhead you get from writing and reading files etc.).

Do you have a question in connection with your observation?

Marie-Therese


Re: it become slower train when copying ML_ABN to ML_AB to continue to train

Posted: Mon Apr 21, 2025 1:32 am
by suojiang_zhang1

Hi,
The continue train become really slower when I copy the ML_ABN to ML_AB

My INCAR looks like:
ISMEAR = 0
SIGMA = 0.5
ISPIN = 1
ISYM = 0
LREAL = Auto
### MD part
IBRION = 0
MDALGO = 3
LANGEVIN_GAMMA = 10.0 10.0 10.0 10.0 10.0 10.0
LANGEVIN_GAMMA_L = 10.0
NSW = 10000
POTIM = 1.5
ISIF = 3
TEBEG = 200
TEEND = 500
PSTRESS = 0.001
PMASS=100
POMASS= 12 8 14 32 16 19
RANDOM_SEED = 486686595 0 0
### Output
LWAVE = .FALSE.
LCHARG = .FALSE.
#NBLOCK = 10
#KBLOCK = 10
##############################
### MACHINE-LEARNING ###
################################
ML_LMLFF = .T.
ML_MODE=train
ML_DESC_TYPE = 1
ML_MCONF_NEW=12
ML_CDOUB=4
ML_CTIFOR=0.02

I check the ML_ABN and find that the numbers of basis sets per atom type increase 3000 from scratch 1500 after continuing train.
I guess the increase of basis set lead to the very slow train.


Re: it become slower train when copying ML_ABN to ML_AB to continue to train

Posted: Mon Apr 21, 2025 4:26 am
by suojiang_zhang1

in addition, I find the ML_FFN was frequently rewrited that spends quitely time on it. How set the rewriting frequency for ML_FFN


Re: it become slower train when copying ML_ABN to ML_AB to continue to train

Posted: Tue Apr 22, 2025 6:00 am
by marie-therese.huebsch

Hi,

It seems that the training becomes slower because you are advancing in the training (and not due to restarting). In other words, running the two jobs with NSW=50000 time steps or one job with NSW=100000 will take approximately the same time.

To control the output frequency of various quantities computed at every step during an MD run is an excellent idea. The tag is ML_OUTBLOCK.

Does this solve the issue?

Marie-Therese