Page 1 of 2

Can not successfully compile VASP in GPU

Posted: Sun Apr 28, 2024 7:03 am
by bhargabkakati
Dear experts,

I was trying to compile vasp and encountered some error. I tried to compile it in two different modes ( MPI + OpenMP and OpenMPI + OpenMP + MKL) but none of them were successful. Here I have attached my log file and the makefile.include for each and also the .bashrc of my system. If you could look at them and share some insight about the error that would be really great.

Thank You.

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 6:10 am
by bhargabkakati
hello, my system configuration is :

OS: Ubuntu 22.04
CPU: 36 Core
GPU: Nvidia RTX A6000
CUDA Version: 12.3

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 8:36 am
by fabien_tran1
Hi,

Compared to the makefiles provided in the arch directory, your makefiles were modified. In particular, the -mp flag (for FC and FCL) that enables OpenMP is missing (while -D_OPENMP is present). Thus, for the moment the suggestion is to add -mp to FC and FCL and recompile (but execute "make veryclean" before). Please, let me know if this helps.

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 10:33 am
by bhargabkakati
Hello,

Than you for pointing out that mistake. As per your suggestion adding -mp to FC and FCL fixed the error, but now I am getting different errors. At first, I thought the wannier90 link is causing the error, so I tried to compile it without wannier90 but this time I got different errors for both MPI+OpenMP and OpenMPI+OpenMP+MKL. I have attached the error for each compilation (with w90 and without w90).

Thank you

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 11:46 am
by fabien_tran1
Considering first the compilation without MKL (error.nvhpc_omp_acc.without.w90.txt), the crash is due to a problem related to the FFT library. Did you solve the problem with the FFT that you recently had ("VASP executable is still linked to the gnu versions"): https://www.vasp.at/forum/viewtopic.php ... =15#p26115

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 12:24 pm
by bhargabkakati
Hello,

No, I was not able to solve it. I wiped my hard drive after that and booted ubuntu there and started a fresh compilation. I was able to compile vasp with "makefile.include.nvhpc_acc" this makefile but I thought it would be better to compile it with OpenMP support. Here I have attached output of "ldd vasp_ncl" and also the makefile.include of that successful compilation. I don't actually have any idea about solving the fftw issue. I have sourced the mkl libraries in my bashrc (export PATH=/opt/intel/oneapi/mkl/2024.0/include/fftw:$PATH) and also I have installed fftw separately too (export LD_LIBRARY_PATH=/home/cmsgpu/softwares/fftw-install/lib:$LD_LIBRARY_PATH).

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 1:37 pm
by bhargabkakati
hello,

I was able to compile the MPI+OpenMP mode successfully (but still no luck with OpenMPI+OpenMP+mkl). I noticed -lfftw3_omp was not added to the LLIBS and adding it solved the issue but I am not able to do test run with these executables.

when I do mpirun -np 1 /home/cmsgpu/softwares/vasp.6.4.2-mpi_omp/bin/vasp_ncl i get the following error:
running 1 mpi-ranks, with 1 threads/rank, on 1 nodes
distrk: each k-point on 1 cores, 1 groups

libgomp: TODO
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[26727,1],0]
Exit code: 1

and this is the output of "ldd vasp_ncl" :

linux-vdso.so.1 (0x00007fff10dea000)
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/extras/qd/lib/libqdmod.so.0 (0x000071a689000000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/extras/qd/lib/libqd.so.0 (0x000071a688c00000)
liblapack_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/liblapack_lp64.so.0 (0x000071a687e00000)
libblas_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libblas_lp64.so.0 (0x000071a685e00000)
libfftw3.so.3 => /lib/x86_64-linux-gnu/libfftw3.so.3 (0x000071a685a00000)
libfftw3_omp.so.3 => /lib/x86_64-linux-gnu/libfftw3_omp.so.3 (0x000071a6892af000)
libhdf5_fortran.so.310 => /home/cmsgpu/softwares/HDF5_nvc_compiler/myhdfstuff/build/HDF_Group/HDF5/1.14.3/lib/libhdf5_fortran.so.310 (0x000071a689261000)
libmpi_usempif08.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libmpi_usempif08.so.40 (0x000071a685600000)
libmpi_usempi_ignore_tkr.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libmpi_usempi_ignore_tkr.so.40 (0x000071a685200000)
libmpi_mpifh.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libmpi_mpifh.so.40 (0x000071a684e00000)
libmpi.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libmpi.so.40 (0x000071a684800000)
libscalapack_lp64.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libscalapack_lp64.so.2 (0x000071a684000000)
libnvhpcwrapcufft.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvhpcwrapcufft.so (0x000071a683c00000)
libcufft.so.11 => /usr/local/cuda-12.3/lib64/libcufft.so.11 (0x000071a678e00000)
libcusolver.so.11 => /usr/local/cuda-12.3/lib64/libcusolver.so.11 (0x000071a671c00000)
libcudaforwrapnccl.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudaforwrapnccl.so (0x000071a671800000)
libnccl.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nccl/lib/libnccl.so.2 (0x000071a660800000)
libcublas.so.12 => /usr/local/cuda-12.3/lib64/libcublas.so.12 (0x000071a65a000000)
libcublasLt.so.12 => /usr/local/cuda-12.3/lib64/libcublasLt.so.12 (0x000071a637000000)
libcudaforwrapblas.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudaforwrapblas.so (0x000071a636c00000)
libcudaforwrapblas117.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudaforwrapblas117.so (0x000071a636800000)
libcudart.so.12 => /usr/local/cuda-12.3/lib64/libcudart.so.12 (0x000071a636400000)
libcudafor_120.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudafor_120.so (0x000071a630400000)
libcudafor.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudafor.so (0x000071a630000000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libacchost.so (0x000071a62fc00000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libaccdevaux.so (0x000071a62f800000)
libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libacccuda.so (0x000071a62f400000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudadevice.so (0x000071a62f000000)
libcudafor2.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libcudafor2.so (0x000071a62ec00000)
libnvf.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvf.so (0x000071a62e400000)
libnvhpcatm.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvhpcatm.so (0x000071a62e000000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000071a62dc00000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvomp.so (0x000071a62ca00000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvcpumath.so (0x000071a62c400000)
libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib/libnvc.so (0x000071a62c000000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000071a62bc00000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000071a689237000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000071a688f19000)
libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x000071a68922b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000071a689226000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000071a688f14000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000071a688f0f000)
libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x000071a688ec5000)
libhdf5_f90cstub.so.310 => /home/cmsgpu/softwares/HDF5_nvc_compiler/myhdfstuff/build/HDF_Group/HDF5/1.14.3/lib/libhdf5_f90cstub.so.310 (0x000071a688ea3000)
libhdf5.so.310 => /home/cmsgpu/softwares/HDF5_nvc_compiler/myhdfstuff/build/HDF_Group/HDF5/1.14.3/lib/libhdf5.so.310 (0x000071a62b400000)
libopen-rte.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libopen-rte.so.40 (0x000071a62b000000)
libopen-pal.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libopen-pal.so.40 (0x000071a62aa00000)
libucp.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libucp.so.0 (0x000071a62a600000)
libuct.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libuct.so.0 (0x000071a62a200000)
libucs.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libucs.so.0 (0x000071a629e00000)
libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x000071a688e94000)
libucm.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/lib/libucm.so.0 (0x000071a629a00000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x000071a688e8d000)
libz.so.1 => /usr/local/lib/libz.so.1 (0x000071a688e68000)
/lib64/ld-linux-x86-64.so.2 (0x000071a6892d1000)
libnvJitLink.so.12 => /usr/local/cuda-12.3/lib64/libnvJitLink.so.12 (0x000071a626400000)
libcusparse.so.12 => /usr/local/cuda-12.3/lib64/libcusparse.so.12 (0x000071a616400000)

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 1:51 pm
by fabien_tran1
What does the command "which mpirun" return?

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 1:53 pm
by bhargabkakati
it returns:
/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/bin/mpirun

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 3:01 pm
by fabien_tran1
I will ask for help from colleagues. Meanwhile you could consider watching the video that is mentioned in another topic (https://www.vasp.at/forum/viewtopic.php?t=19472).

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 3:06 pm
by bhargabkakati
thank you for the suggestion but I compiled VASP watching that video only. But not able to compile successfully unlike in the video

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 3:26 pm
by fabien_tran1
Are you sure that there is no mistake in the paths that are in your makefile.include? For instance, do the directories
/opt/intel/oneapi/mkl/2024.1/lib/
/opt/intel/oneapi/mkl/2024.1/include/
really exist?

Another question: is there an environment module (https://modules.readthedocs.io/en/latest/#) installed on your machines?

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 3:52 pm
by bhargabkakati
hello,

Yes, those paths do exist and there is no environment module installed. I installed all the things mentioned in ( https://implant.fs.cvut.cz/vasp-gpu-compilation/ ).

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 3:55 pm
by bhargabkakati
I'd also like to say that mpirun works fine on quantum espresso, wannier90 and VAMPIRE and also on the makefile.include.nvhpc_acc version of VASP.

Re: Can not successfully compile VASP in GPU

Posted: Mon Apr 29, 2024 4:03 pm
by fabien_tran1
Do you have by chance access to an older version of the Nvidia compiler (i.e., older than 24.3)?