Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.
Posted: Fri May 24, 2024 1:49 pm
Dear VASP Team and Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System, slurm and vasp information:
Slurm Version: 21.08.5
VASP version: 6.4.3
Job Submission Script:
Performance Observation:
When running the job through Slurm:
When running the job directly with mpirun:
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
The attachment is the test example used above.
Thank you for your time and assistance.
Best regards,
Zhao
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System, slurm and vasp information:
Code: Select all
$ inxi -CMmS
System:
Host: x13dai-t Kernel: 6.5.0-18-generic arch: x86_64 bits: 64 Desktop: GNOME
v: 42.9 Distro: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Machine:
Type: Unknown System: Supermicro product: Super Server v: 0123456789
serial: 0123456789
Mobo: Supermicro model: X13DAI-T v: 1.01 serial: WM23AS002622
UEFI: American Megatrends LLC. v: 2.1 date: 12/14/2023
Memory:
System RAM: total: 512 GiB available: 503.52 GiB used: 15.5 GiB (3.1%)
Array-1: capacity: 6 TiB note: check slots: 16 modules: 16
EC: Single-bit ECC
Device-1: P1-DIMMA1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-2: P1-DIMMB1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-3: P1-DIMMC1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-4: P1-DIMMD1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-5: P1-DIMME1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-6: P1-DIMMF1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-7: P1-DIMMG1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-8: P1-DIMMH1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-9: P2-DIMMA1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-10: P2-DIMMB1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-11: P2-DIMMC1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-12: P2-DIMMD1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-13: P2-DIMME1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-14: P2-DIMMF1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-15: P2-DIMMG1 type: DDR5 size: 32 GiB speed: 4800 MT/s
Device-16: P2-DIMMH1 type: DDR5 size: 32 GiB speed: 4800 MT/s
CPU:
Info: 2x 48-core model: Intel Xeon Platinum 8488C bits: 64 type: MT MCP SMP
cache: L2: 2x 96 MiB (192 MiB)
Speed (MHz): avg: 876 min/max: 800/3800 cores: 1: 800 2: 800 3: 800 4: 800
5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800 14: 800
15: 800 16: 800 17: 800 18: 800 19: 800 20: 800 21: 800 22: 800 23: 800
24: 795 25: 800 26: 800 27: 800 28: 800 29: 800 30: 800 31: 800 32: 800
33: 800 34: 800 35: 800 36: 800 37: 887 38: 800 39: 800 40: 3100 41: 800
42: 800 43: 2222 44: 800 45: 2500 46: 800 47: 800 48: 800 49: 800 50: 800
51: 800 52: 800 53: 800 54: 800 55: 800 56: 800 57: 800 58: 800 59: 800
60: 800 61: 800 62: 800 63: 800 64: 800 65: 800 66: 800 67: 800 68: 800
69: 800 70: 800 71: 800 72: 800 73: 800 74: 800 75: 800 76: 800 77: 800
78: 800 79: 800 80: 800 81: 800 82: 800 83: 800 84: 800 85: 800 86: 800
87: 800 88: 800 89: 800 90: 800 91: 800 92: 800 93: 800 94: 800 95: 800
96: 800 97: 800 98: 800 99: 800 100: 800 101: 800 102: 800 103: 800
104: 800 105: 800 106: 800 107: 800 108: 800 109: 800 110: 800 111: 800
112: 800 113: 800 114: 800 115: 800 116: 2400 117: 800 118: 800 119: 800
120: 800 121: 800 122: 800 123: 800 124: 800 125: 800 126: 800 127: 800
128: 800 129: 800 130: 800 131: 3800 132: 2400 133: 1200 134: 800 135: 800
136: 800 137: 800 138: 800 139: 800 140: 800 141: 800 142: 2500 143: 801
144: 800 145: 800 146: 800 147: 800 148: 800 149: 800 150: 800 151: 800
152: 800 153: 800 154: 800 155: 800 156: 800 157: 800 158: 800 159: 800
160: 800 161: 800 162: 800 163: 800 164: 800 165: 800 166: 800 167: 800
168: 800 169: 800 170: 800 171: 800 172: 800 173: 800 174: 800 175: 1021
176: 800 177: 800 178: 800 179: 800 180: 800 181: 800 182: 1500 183: 800
184: 800 185: 800 186: 800 187: 800 188: 800 189: 800 190: 800 191: 800
192: 800
VASP version: 6.4.3
Job Submission Script:
Code: Select all
#!/usr/bin/env bash
#SBATCH -N 1
#SBATCH -D .
#SBATCH --output=%j.out
#SBATCH --error=%j.err
##SBATCH --time=2-00:00:00
#SBATCH --ntasks=36
#SBATCH --mem=64G
echo '#######################################################'
echo "date = $(date)"
echo "hostname = $(hostname -s)"
echo "pwd = $(pwd)"
echo "sbatch = $(which sbatch | xargs realpath -e)"
echo ""
echo "WORK_DIR = $WORK_DIR"
echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES"
echo "SLURM_NTASKS = $SLURM_NTASKS"
echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE"
echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK"
echo "SLURM_JOBID = $SLURM_JOBID"
echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST"
echo "SLURM_NNODES = $SLURM_NNODES"
echo "SLURMTMPDIR = $SLURMTMPDIR"
echo '#######################################################'
echo ""
module purge > /dev/null 2>&1
module load vasp
ulimit -s unlimited
mpirun vasp_std
When running the job through Slurm:
Code: Select all
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR
LOOP: cpu time 14.4893: real time 14.5049
LOOP: cpu time 14.3538: real time 14.3621
LOOP: cpu time 14.3870: real time 14.3568
LOOP: cpu time 15.9722: real time 15.9018
LOOP: cpu time 16.4527: real time 16.4370
LOOP: cpu time 16.7918: real time 16.7781
LOOP: cpu time 16.9797: real time 16.9961
LOOP: cpu time 15.9762: real time 16.0124
LOOP: cpu time 16.8835: real time 16.9008
LOOP: cpu time 15.2828: real time 15.2921
LOOP+: cpu time 176.0917: real time 176.0755
Code: Select all
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ module load vasp
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ module list
Currently Loaded Modules:
1) lmod 3) hdf5/1.14.3-oneapi.2023.2.0 5) dftd4/main-oneapi.2023.2.0
2) oneapi/2023.2.0 4) wannier90/develop-serial-oneapi.2023.2.0 6) vasp/6.4.3-oneapi-oneapi.2023.2.0
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR
LOOP: cpu time 9.0072: real time 9.0074
LOOP: cpu time 9.0515: real time 9.0524
LOOP: cpu time 9.1896: real time 9.1907
LOOP: cpu time 10.1467: real time 10.1479
LOOP: cpu time 10.2691: real time 10.2705
LOOP: cpu time 10.4330: real time 10.4340
LOOP: cpu time 10.9049: real time 10.9055
LOOP: cpu time 9.9718: real time 9.9714
LOOP: cpu time 10.4511: real time 10.4470
LOOP: cpu time 9.4621: real time 9.4584
LOOP+: cpu time 110.0790: real time 110.0739
The attachment is the test example used above.
Thank you for your time and assistance.
Best regards,
Zhao