Scaling behavior of ACFDT-RPA and low-scaling version
Posted: Tue Feb 28, 2017 7:33 pm
I am curious as to what the memory requirements for ACFDT-RPA calculations are, and what they will be for the low-scaling methods that have not yet been released. My understanding is that the difficult part of ACFDT-RPA calculations is evaluating Tr(Log(I - χ0(iω)ν)), which involves diagonalizing an (NBANDS * NKPTS)x(NBANDS * NKPTS) array for NOMEGA different frequency points. If this is correct, then it would mean the memory requirements scale as O(NBANDS^2 * NKPTS^2) (assuming that only one frequency point is stored at a time) and the computational complexity is O(NBANDS^3 * NKPTS^3 * NOMEGA). However, it is commonly said that ACFDT-RPA scales as O(N^4), where N is usually left undefined but I assume refers to the number of electrons. Have I misidentified the step which determines the overall scaling, or have I made a mistake in how I am calculating the computational complexity? What are the actual memory requirements of doing a traditional ACFDT-RPA calculation?
Furthermore, I am concerned about the memory requirements for the as-yet unreleased low-scaling ACFDT-RPA algorithm as described in J. Chem. Theory Comput. 2014, 10, 2498-2507. I understand that the scaling of this algorithm is dictated by the calculation of the real-space imaginary time single particle Green's function (eq. 32), but it appears to me as though the memory requirements are dictated by the cosine transform to obtain the reciprocal-space imaginary frequency RPA response function (eq. 12). In order to evaluate this cosine transform, the response function must be simultaneously stored at all frequency points and all time points, resulting in a memory requirement of O(NBANDS^2 * NKPTS^2 * (NTAU + NOMEGA)). The paper suggests that "more than 20 frequency and time points will be hardly ever required" even for large systems.
Large metallic systems are likely the worst case scenario, requiring the largest value of NOMEGA for good convergence. For example, I am interested in using RPA to study the adsorption of organic molecules to the Pd(111) surface, which I model as a 3x3 slab with 4 layers and a vacuum gap of 15 Angstrom. If I simulate this system with no adsorbate with a 450 eV plane wave cutoff (due to the presence of C and O atoms in adsorbates) and a 3x3x1 k-point mesh (5 irreducible k-points), I find a maximum number of around 28800 plane waves. For 20 omega points and 20 tau points, this would suggest that around 12 TB of memory would be required to evaluate this cosine transform (with a 28800x28800x5x5 complex-valued array at 40 time- and frequency-points). Obviously these memory requirements can be reduced by lowering the plane wave cutoff, by shrinking the vacuum gap, by using a smaller unit cell, or by using fewer omega- and tau-points, but it does suggest that despite the lower formal scaling of the unreleased ACFDT-RPA algorithm, it may still not be possible to treat systems that are substantially larger than those that can be treated by the traditional ACFDT-RPA algorithm due to the increased memory requirements of the algorithm. Could you shed some light on what the memory requirements for the current and unreleased ACFDT-RPA algorithms are? Will it be possible for me to study my large metal-adsorbate systems with the low-scaling ACFDT-RPA algorithm?
Furthermore, I am concerned about the memory requirements for the as-yet unreleased low-scaling ACFDT-RPA algorithm as described in J. Chem. Theory Comput. 2014, 10, 2498-2507. I understand that the scaling of this algorithm is dictated by the calculation of the real-space imaginary time single particle Green's function (eq. 32), but it appears to me as though the memory requirements are dictated by the cosine transform to obtain the reciprocal-space imaginary frequency RPA response function (eq. 12). In order to evaluate this cosine transform, the response function must be simultaneously stored at all frequency points and all time points, resulting in a memory requirement of O(NBANDS^2 * NKPTS^2 * (NTAU + NOMEGA)). The paper suggests that "more than 20 frequency and time points will be hardly ever required" even for large systems.
Large metallic systems are likely the worst case scenario, requiring the largest value of NOMEGA for good convergence. For example, I am interested in using RPA to study the adsorption of organic molecules to the Pd(111) surface, which I model as a 3x3 slab with 4 layers and a vacuum gap of 15 Angstrom. If I simulate this system with no adsorbate with a 450 eV plane wave cutoff (due to the presence of C and O atoms in adsorbates) and a 3x3x1 k-point mesh (5 irreducible k-points), I find a maximum number of around 28800 plane waves. For 20 omega points and 20 tau points, this would suggest that around 12 TB of memory would be required to evaluate this cosine transform (with a 28800x28800x5x5 complex-valued array at 40 time- and frequency-points). Obviously these memory requirements can be reduced by lowering the plane wave cutoff, by shrinking the vacuum gap, by using a smaller unit cell, or by using fewer omega- and tau-points, but it does suggest that despite the lower formal scaling of the unreleased ACFDT-RPA algorithm, it may still not be possible to treat systems that are substantially larger than those that can be treated by the traditional ACFDT-RPA algorithm due to the increased memory requirements of the algorithm. Could you shed some light on what the memory requirements for the current and unreleased ACFDT-RPA algorithms are? Will it be possible for me to study my large metal-adsorbate systems with the low-scaling ACFDT-RPA algorithm?