Tuesday, November 5, 2013

NERSC Errors

Is your job acting funny? Check the OSZICAR, OUTCAR, execuptionOuput, and any .o<some numbers> or .e<some numbers> files...

Error: *** glibc detected *** gvasp: double free or corruption (out): 0x0000000008b72010 ***
Info: "... indicative of an error when trying to free up memory I think (as in they want to allocate an array or something).  Usually these kinds of errors are programming errors (bugs), though they could also have to do with what compiler was used I suppose.  Here are some people discussing this kind of problem relating to vasp: http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.5588" - Jon Wyrick
Fix: Modified gqscript to include "module load vasp/5.3.3"

Error: *** glibc detected *** gvasp: double free or corruption (out): 0x0000000008b72010 ***
Info: "... indicative of an error when trying to free up memory I think (as in they want to allocate an array or something).  Usually these kinds of errors are programming errors (bugs), though they could also have to do with what compiler was used I suppose.  Here are some people discussing this kind of problem relating to vasp: http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.5588" - Jon Wyrick
Fix: Modified gqscript to include "module load vasp/5.3.3"

Error: Stale NFS file handle
Info: "I believe the stale NFS handle is a problem on their end (as in you didn't do anything wrong).  These happen when you try to write data to a shared folder (e.g. our project folder) and for one reason or another the connection doesn't quite go through.  NFS is the file sharing system that is used.  So probably when VASP was trying to write one of its output files (such as OUTCAR or OSZICAR, etc.) which it updates each electronic step, something must have gone haywire." - Jon Wyrick
Fix: Rerun. Potentially you can rerun your job in the scratch folder (but remember to retrieve it as it will get deleted after a period of inactivity) as this will take out the transfer step between NFS and non-NFS folders.

Error: apsched: claim exceeds reservation's node-count
Info: Error in gscript
Fix: See NERSC website on using fewer cores per node

Error: the triple product of the basis vectors is negative exchange two basis vectors
Info: Your unit cell is defined in a left-hand basis set system. VASP will only work with right hand chirality.
Fix: Change your unit vectors (make sure the cross product is positive)

Error: OOM killer terminated this process
Info: You are "out of memory".
Fix: Try a gscript that runs on nodes with more memory. (gscript_high_mem > gscript_med_mem > gscript_long)

Error: Error reading item 'NPAR' from file INCAR.
Info: "...there are most likely "hidden" characters in your INCAR file causing it to fail being read... it happens as a result of translating from windows text files to unix text files - our installation of linux on our cluster doesn't seem to care about it, but whatever unix they have at nersc does care."
Fix: "dos2unix <filename>" for all relevant files in file directory

Other useful commands:
pwd (print working directory)
qstat -u bartels  (see only the jobs from bartels)
qstat -f [job number]   (see full output of a job)
qdel [job number]  (stop a job)

No comments:

Post a Comment