Reliable high-resolution CPU TIME retrieval under MVS 3.8J

The s370_perf instruction time benchmark (see project and posting) uses currently the elapsed time retrieved with STCK for all timing measurements. Mark L. Gaubatz commented rightfully that CPU time should be used as the basis of all performance measurements (see message, see topic). To implement this one needs a low overhead method to access the task CPU time with high resolution, best with microsecond precision.

s370_perf is designed to run on MVS 3.8J, as freely available with the Tur(n)key 4- System distribution. The methods used for CPU time retrieval on modern MVS systems are not available, most notably

Tom Armstrong pointed me to a function CPUTIM which is included in the Algol F IVP code IEXSAMP4 and used for algorithm timing. It essentially returns the sum of the ACB fields ASCBEJST and ASCBSRBT. The essential drawback is that these fields are only updated after each dispatch, and a test quickly showed that the CPU time retrieved this way has errors of up to 170 msec (seen on a 2 CPU systems with only one user job active).

When searching for other 'cpu time from ASCBEJST' methods I stumbled across the TIMED function included in the 1998 edition of the KERNLIB section of the CERN PACKLIB (see sourcecode). This function calls the SVC CALLDISP to force an update of ASCBEJST. Turns out that this works under MVS/SE, but not under MVS 3.8J. A WAIT on a fall-through ECB (see posting) also doesn't work under MVS 3.8J.

A comment in the TIMED function finally pointed me to the proper solution. The comment states effectively:

for on non-SE systems use ASCBEJST + (TOD - LCCADTOD).

LCCADTOD is a LCCA field which holds the TOD of the last dispatch. So I searched the whole MVS 3.8J code base for code using LCCADTOD. Easy to do after a full export with the hercexport tool (see posting on hercexport and tk4- takeout) and a simple

  find -name "*.mac" -type f -print0 | xargs -0 grep -l ",LCCADTOD"
lists all PDS members with code accessing directly LCCADTOD. The essential hits are
which contain two versions of a CPUTIM function from a CBT volume, one for FORTRAN and one for PL/I. The essential core of both implementations is
  LM    R4,R5,TOD
where $ALD and $SLD are macros for double word integer add/sub (available on the same CBT volume in PDS cbtcov.file188). A quick test immediately shows that this works fine and returns a high resolution CPU time.

However, the key trick of this method, that the time since last dispatch is added, causes also a vulnerability. If a dispatch occurs between the subtract of LCCADTOD and the add of ASCBEJST the CPU time spend during the previous dispatch is double counted. Not very likely to happen, but possible, especially on a single CPU system with some I/O load. To protect against this vulnerability, I store the LCCADTOD value at the beginning of the function, and check after all arithmetic whether the LCCADTOD value changed. If yes, a dispatch happened, and the whole procedure is simply re-tried. Taken all together the core of my CPUTIM method looks like

         L     R6,PSALCCAV        get LCCA ptr
         L     R7,PSAAOLD         get ASCB ptr
         LA    R10,9              init retry loop count
CPUTIMR  LM    R8,R9,LCCADTOD     get initial LCCADTOD
         STM   R8,R9,SAVDTOD      and save it
         STCK  CKBUF              store TOD
         LM    R0,R1,CKBUF
         SLR   R1,R9              low order:  sum=TOD-LCCADTOD
         BC    3,*+4+4            check for borrow
         SL    R0,=F'1'           and correct if needed
         SLR   R0,R8              high order: sum=TOD-LCCADTOD
         LM    R8,R9,ASCBEJST     load ASCBEJST
         ALR   R1,R9              low order:  sum+=ASCBEJST
         BC    12,*+4+4           check for carry
         AL    R0,=F'1'           and correct if needed
         ALR   R0,R8              high order: sum+=ASCBEJST
         LM    R8,R9,ASCBSRBT     load ASCBSRBT
         ALR   R1,R9              low order:  sum+=ASCBSRBT
         BC    12,*+4+4           check for carry
         AL    R0,=F'1'           and correct if needed
         ALR   R0,R8              high order: sum+=ASCBSRBT
         LM    R8,R9,LCCADTOD     get final LCCADTOD
         C     R9,SAVDTOD+4       check low order
         BNE   CPUTIMN            if ne, dispatch detected
         C     R8,SAVDTOD         check high order
         BE    CPUTIME            if eq, all fine
CPUTIMN  BCT   R10,CPUTIMR        retry in case dispatch detected
CPUTIME  <return handling>        CPUTIM in register pair R0,R1
CKBUF    DS    1D

Any comments on the function logic are very much appreciated, especially on

A test code with the logic described above as available as part of s370_perf project, see source file and JCL template , to be used with the hercjis preprocessor.

For original posting to Yahoo! Group - H390-MVS see topic 18217.