VASP on IA64 (itanium2) compiled with INTEL Compilers, mkl, fttw and mpi

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
fish
Newbie
Newbie
Posts: 12
Joined: Tue Jun 14, 2005 1:13 pm
License Nr.: 198
Location: Argonne National Lab

VASP on IA64 (itanium2) compiled with INTEL Compilers, mkl, fttw and mpi

#1 Post by fish » Thu Jun 11, 2009 6:35 pm

Please update this makefile and post it in this forum after successfully compiling VASP on newer versions of the INTEL compilers. You will be saving me and others alot of time.

Thanks.

*****************makefile for VASP is below***********
.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler with MKL LAPACK, BLAS and FTTW
# for IA64 systems (itanium2) - John.Low@uop.com 06/11/09
#
# This makefile was tested only under RedHat Linux AS 5.3 on a HP RX5670
# INTEL fortran compiler version 11.0 update 083 has been tested.
# The INTEL mkl libraries 10.1 update 03 and INTEL MPI library version 3.2
# update1 was used.
#
# Note that the "sequential" (not multithreaded) version of the
# mkl LAPACK libraries is used in this makefile. Using the multithreaded
# version will automatically generate a parallel version of VASP with
# openMPI which is not efficient. Although you can turn off the multithreading
# by setting the environmental variable OMP_NUM_THREADS to 1, using the
# sequential LAPACK libraries is more elegant and will not conflict with
# other software.
#
# The -O3 compiler option is very dangerous in the INTEL compilers.
# There appears to be a relatively small performance gain (~10%) with the -O3
# option in a few objects such as nonlr.o, nonl.o and main.0. I recommend the
# use of -O3 for only the most CPU intensive routines. Be sure to extensively
# test any other routines compiled at -O3 before using them in production mode.
# You will save alot of time and effort by compiling most of the objects
# in VASP with -O2 with the INTEL compiler.
#
# It might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL**** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent when you use Intel CPU's
#
# 3b) or obtain the atlas based BLAS routines
# http://math-atlas.sourceforge.net/
# you certainly need atlas on the Athlon, since the mkl
# routines are not optimal on the Athlon.
# If you want to use atlas based BLAS, check the lines around LIB=
#
# 3c) mindblowing fast SSE2 (4 GFlops on P4, 2.53 GHz)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
# be sure to source the script to define the environmental variables for
# the fortran compiler.
#(i.e. source /opt/intel/Compiler/11.0/083/bin/ifortvars.sh ia64)
# This script defined the environmental variables required for the fortran
# compiler, the c++ compiler and the mkl libraries. It came from a bundled
# version of these compilers and libraries.
# The script (or scripts!) could be different at your site.
#-----------------------------------------------------------------------
FC=ifort
# fortran linker
FCL=$(FC)


#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# IFC work around some IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# 32768 for Intel Itanium2 processor
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"UOP_RX5670_ifort_mkl_fttw_090610\" \
-Dkind8 -DNGXhalf -DCACHE_SIZE=32768 -DPGF90 -Davoidalloc \
-DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
#-----------------------------------------------------------------------

#FFLAGS = -FR -lowercase -assume byterecl
FFLAGS= -FR -lower_case

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -mtune=intanium2 optimize for itanium2
#-----------------------------------------------------------------------

# O2 optimization appears to be the safest optimization level with INTEL compiler with good performance.
# There appears to be a problem with the routines involved with the conjugate gradient
# geometry (ionic) optimization at -O3.
OFLAG=-O2 -mtune=itanium2

OFLAG_HIGH = -O3 -mtune=itanium2
# These routines benefit from higher levels of optimization, according to the VASP manual.
OBJ_HIGH = nonlr.o nonl.o

OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)


#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# on P4, VASP works fastest with the libgoto library
# so that's what I recommend
#-----------------------------------------------------------------------

# Atlas based libraries
#ATLASHOME= $(HOME)/archives/BLAS_OPT/ATLAS/lib/Linux_P4SSE2/
#BLAS= -L$(ATLASHOME) -lf77blas -latlas

# use specific libraries (default library path might point to other libraries)
#BLAS= $(ATLASHOME)/libf77blas.a $(ATLASHOME)/libatlas.a

# use the mkl Intel libraries for p4 (www.intel.com)
# mkl.5.1
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lpthread

# mkl.5.2 requires also to -lguide library
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lguide -lpthread

# even faster Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#BLAS= /opt/libs/libgoto/libgoto_p4_512-r0.6.so0

# LAPACK, simplest use vasp.4.lib/lapack_double
#LAPACK= ../vasp.4.lib/lapack_double.o

# use atlas optimized part of lapack
#LAPACK= ../vasp.4.lib/lapack_atlas.o -llapack -lcblas

# use the mkl Intel lapack and blas
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
# the BLAS line is left blank because the blas library is included
# in the LAPACK variable.
BLAS=
LAPACK=-L/opt/intel/Compiler/11.0/081/mkl/lib/64 -lmkl_intel_lp64 -lmkl_blacs_lp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread
#

#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X, 7.1) nothing is required
LINK =
# compiler version 7.0 generates some vector statments which are located
# in the svml library, add the LIBPATH and the library (just in case)
#LINK = -L/opt/intel/compiler70/ia32/lib/ -lsvml

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.3.0.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o
#This makefile uses the fftw interface in Intel mkl libraries.
#You will need to build this interface after you install your mkl libraries.
#Please read the "fine" mkl manuals for more details.
#Be sure to copy fftw.f from your mkl installation to the vasp source directory.
#Remember to copy it again if you ever do a "make clean".
#For the Intel professional fortran compiler version 11.0 update 81,
#(which comes bundled with mkl)the fftw.f file was located in
#/opt/intel/Compiler/11.0/081/mkl/include/fftw/fftw3.f on my installation.
FFT3D = fftw3d.o fft3dlib.o /opt/intel/Compiler/11.0/081/mkl/lib/64/libfftw3xf_intel.a


#=======================================================================
# MPI section, uncomment the following lines
#
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77
# appends *two* underscores to symbols that contain already an
# underscore (i.e. MPI_SEND becomes mpi_send__). The pgf90/ifc
# compilers however append only one underscore.
# Precompiled mpi version will also not work !!!
#
# We found that mpich.1.2.1 and lam-6.5.X to lam-7.0.4 are stable
# mpich.1.2.1 was configured with
# ./configure -prefix=/usr/local/mpich_nodvdbg -fc="pgf77 -Mx,119,0x200000" \
# -f90="pgf90 " \
# --without-romio --without-mpe -opt=-O \
#
# lam was configured with the line
# ./configure -prefix /opt/libs/lam-7.0.4 --with-cflags=-O -with-fc=ifc \
# --with-f77flags=-O --without-romio
#
# please note that you might be able to use a lam or mpich version
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for INTEL mpi: After sourceing the environmental script
# for INTEL mpi (i.e. /opt/intel/impi/3.2/bin/mpivars.sh) you can use the following line.
#-----------------------------------------------------------------------

#FC=mpiifort
#FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

#CPP = $(CPP_) -DMPI -DHOST=\"UOP_RX5670_mpiifort_mkl_fttw_090610\" -DIFC \
# -Dkind8 -DNGZhalf -DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
# -DMPI_BLOCK=500 \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

BLACS=$(HOME)/archives/SCALAPACK/BLACS/
SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

SCA= $(SCA_)/libscalapack.a \
$(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

#LIB = -L../vasp.4.lib -ldmy \
# ../vasp.4.lib/linpack_double.o $(LAPACK) \
# $(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
#FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o

# fftw.3.0.1 is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/libs/fftw-3.0.1/lib/libfftw3.a

#fftw3 wrapper from INTEL mkl
#FFT3D= fftmpiw.o fftmpi_map.o fft3dlib.o /opt/intel/Compiler/11.0/081/mkl/lib/64/libfftw3xf_intel.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o aedens.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp $(LINK) main.o $(SOURCE) $(FFT3D) $(LIB)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used
#-----------------------------------------------------------------------

fft3dlib.o : fft3dlib.F
$(CPP)
# $(FC) -FR -lowercase -O1 -tpp7 -xW -unroll0 -e95 -vec_report3 -c $*$(SUFFIX)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

choleski2.o : choleski2.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
#-----------------------------------------------------------------------
# dependency rules to allow parallel build
#-----------------------------------------------------------------------
#include make_dep.h
Last edited by fish on Thu Jun 11, 2009 6:35 pm, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

VASP on IA64 (itanium2) compiled with INTEL Compilers, mkl, fttw and mpi

#2 Post by admin » Tue Jan 19, 2010 11:50 am

appropriate makefiles are delivered with each new (sub) release of vasp. we usually adjust the makefiles to the up-to-date release of the INTEL compiler at the time when the code version issued.
Last edited by admin on Tue Jan 19, 2010 11:50 am, edited 1 time in total.

Post Reply