internal error in: mpi.F at line: 898
Posted: Thu Dec 07, 2023 8:56 am
We installed Vasp 6.4.0, without any warning or error using nvdia_hpc_sdk/23.9 kit in our GPU machine.
We were able to run calculations for a few days and afterwards we are encountering an internal error in mpi.F file.
The detailed error is given below for your reference.
Please help us to resolve the issue.
Thanks in advance.
SCANMAT.
We were able to run calculations for a few days and afterwards we are encountering an internal error in mpi.F file.
The detailed error is given below for your reference.
Code: Select all
Local host: scanmatdgx1
--------------------------------------------------------------------------
running 1 mpi-ranks, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
OpenACC runtime initialized ... 1 GPUs detected
-----------------------------------------------------------------------------
| _ ____ _ _ _____ _ |
| | | | _ \ | | | | / ____| | | |
| | | | |_) | | | | | | | | | |
| |_| | _ < | | | | | | |_ | |_| |
| _ | |_) | | || | | |__| | _ |
| (_) |____/ \____/ \_____| (_) |
| |
| internal error in: mpi.F at line: 898 |
| |
| M_init_nccl: Error in ncclCommInitRank |
| |
| If you are not a developer, you should not encounter this problem. |
| Please submit a bug report. |
| |
-----------------------------------------------------------------------------
Warning: ieee_inexact is signaling
1
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
Please help us to resolve the issue.
Thanks in advance.
SCANMAT.