MD-on the fly-out of memory
Posted: Sat Nov 26, 2022 8:46 am
Dear everyone:
I'm running the MD simulation using the on-the-fly method.But it has some problem,always show the out of memory after few steps.The errors message are show as follows:
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cpu04: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 3 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: First task exited 30s ago
srun: StepId=272349.0 tasks 1-38,40-90,92-112,114-156,158-182,184-191: running
srun: StepId=272349.0 tasks 0,39,91,113,157,183: exited abnormally
srun: launch/slurm: _step_signal: Terminating StepId=272349.0
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 272349.0 ON cpu04 CANCELLED AT 2022-11-26T13:18:22 ***
If you have some ideas,please kindly let me know.
THANK YOU!
I'm running the MD simulation using the on-the-fly method.But it has some problem,always show the out of memory after few steps.The errors message are show as follows:
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cpu04: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 3 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: First task exited 30s ago
srun: StepId=272349.0 tasks 1-38,40-90,92-112,114-156,158-182,184-191: running
srun: StepId=272349.0 tasks 0,39,91,113,157,183: exited abnormally
srun: launch/slurm: _step_signal: Terminating StepId=272349.0
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 272349.0 ON cpu04 CANCELLED AT 2022-11-26T13:18:22 ***
If you have some ideas,please kindly let me know.
THANK YOU!