Wednesday, September 9, 2020

VASP Code Performance Scaling: Testing with Au/graphene

Quick Description: When applying for a supercomputer allocation, you will need to justify the amount of resources that you're applying for. In the case of XSEDE, you can apply for a startup allocation, and the point of this is largely to do some baseline calculations from which you can project how much to ask for in your next application.

The Point: Here's how to estimate the supercomputer resources you will require.


Prerequisites: Ability to gain an XSEDE allocation with access to Stampede (now Stampede2).


Notes:
 

  1. For any sort of experiment, you need a control experiment to compare with. In DFT, this calculation can usually serve as a lower bound estimate for the resource consumption of your subsequent (likely more complicated) calculations. (NOTE: Here, I'm assuming that you are working with similar systems of interest and tweaking the POSCAR slightly between calculations. That is to say that most of your atoms,  their elemental composition, and they geometric coordinates should stay largely the same. At the very least, your cell size should not be changing.)
    • In my case, I'm studying metal clusters on graphene. Without going into too much detail, this system is used as a test case for exploring other metals and their interaction with graphene surfaces, where we assume the time necessary to complete calculations on similar systems will roughly match that of this control. 
    • [removed]
  2. Unfortunately, I decided to remove a picture of our scaling performance plot, because my PI was uncomfortable with me sharing it online. Essentially, the plot shows how the performance time scales with the number of cores used for the calculation. Up until a certain point, increasing the number of cores will decrease the calculation time. Eventually, this speed-up factor will level off, at which point you have determined the number of cores you will likely use on your subsequent calculations. 
  3. To get a rough idea of how many cores you need, VASP utilizes MPI parallelization, assigning one MPI rank to one core, and Larsson generally suggests that one core be assigned per atom in a VASP job. Since our baseline calculation consisted of 72 atoms, we guessed that 64 cores (using powers of 2 is recommended) would work best, which is what our performance scaling calculations showed. In our case, all simulations were performed on the Stampede2 computing cluster with Knights Landing (KNL) nodes, which each have 68 cores. This means that (if you are increasingly doubling your number of cores, as you should), a single node should run 64 cores max. Per TACC’s Best Known Practices for KNL Nodes 2, 4 cores should be left unutilized anyway. 
  4. Again, in our case, we tested up to 128 cores, consisting of 2 nodes with 64 cores each. This resulted in a worse run time than 1 node with 64 cores. Some of this poor performance is due to communication inefficiencies between the nodes. Obviously, using one node is also easier to keep track of. :)
  5. For our work, we intended to model variations of our control calculations with at most 76 atoms total, making 1 node and 64 cores an appropriate configuration to ensure fast run times and efficient SU estimation for Stampede2. For a  completely optimized job, total run times was around 12 hours. We then counted the number of variations we hoped to try and padded on a few extras hours for good measure. This calculator from XSEDE may also prove useful to you.

No comments:

Post a Comment