Reliable high-resolution CPU TIME retrieval under MVS 3.8J
s370_perf instruction time benchmark (see
uses currently the elapsed time retrieved with
STCK for all
timing measurements. Mark L. Gaubatz commented rightfully that CPU time
should be used as the basis of all performance measurements (see
To implement this one needs a low overhead method to access the task CPU time with high resolution, best with microsecond precision.
s370_perf is designed to run on
MVS 3.8J, as freely
available with the
Tur(n)key 4- System
distribution. The methods used for CPU time retrieval on modern MVS systems
are not available, most notably
TIMEUSEDmacro not available (was introduced later)
TCBnot available (was introduced later)
Tom Armstrong pointed me to a function
CPUTIM which is included
in the Algol F IVP code
IEXSAMP4 and used for algorithm timing.
It essentially returns the sum of the
ASCBSRBT. The essential drawback is
that these fields are only updated after each dispatch, and a test quickly
showed that the CPU time retrieved this way has errors of up to 170 msec
(seen on a 2 CPU systems with only one user job active).
When searching for other 'cpu time from ASCBEJST' methods I stumbled
TIMED function included in the 1998 edition of the
KERNLIB section of the
CERN PACKLIB (see
This function calls the
SVC CALLDISP to force an update of
ASCBEJST. Turns out that this works under
but not under
MVS 3.8J. A
WAIT on a fall-through
also doesn't work under
A comment in the
TIMED function finally pointed me to the proper solution. The comment states effectively:
ASCBEJST + (TOD - LCCADTOD).
LCCADTOD is a
LCCA field which holds the TOD
of the last dispatch. So I searched the whole
MVS 3.8J code base
for code using
LCCADTOD. Easy to do after a full export with the
hercexport tool (see posting on
hercexport and tk4- takeout) and a simple
find -name "*.mac" -type f -print0 | xargs -0 grep -l ",LCCADTOD"lists all PDS members with code accessing directly
LCCADTOD. The essential hits are
CPUTIMfunction from a CBT volume, one for FORTRAN and one for PL/I. The essential core of both implementations is
STCK TOD LM R4,R5,TOD $SLD R4,LCCADTOD $ALD R4,ASCBEJSTwhere
$SLDare macros for double word integer add/sub (available on the same CBT volume in PDS
cbtcov.file188). A quick test immediately shows that this works fine and returns a high resolution CPU time.
However, the key trick of this method, that the time since last dispatch
is added, causes also a vulnerability. If a dispatch occurs between the
LCCADTOD and the add of
CPU time spend during the previous dispatch is double counted. Not very likely
to happen, but possible, especially on a single CPU system with some I/O load.
To protect against this vulnerability, I store the
at the beginning of the function, and check after all arithmetic whether the
LCCADTOD value changed. If yes, a dispatch happened, and the
whole procedure is simply re-tried. Taken all together the core of my
CPUTIM method looks like
L R6,PSALCCAV get LCCA ptr L R7,PSAAOLD get ASCB ptr LA R10,9 init retry loop count * CPUTIMR LM R8,R9,LCCADTOD get initial LCCADTOD STM R8,R9,SAVDTOD and save it * STCK CKBUF store TOD LM R0,R1,CKBUF SLR R1,R9 low order: sum=TOD-LCCADTOD BC 3,*+4+4 check for borrow SL R0,=F'1' and correct if needed SLR R0,R8 high order: sum=TOD-LCCADTOD * LM R8,R9,ASCBEJST load ASCBEJST ALR R1,R9 low order: sum+=ASCBEJST BC 12,*+4+4 check for carry AL R0,=F'1' and correct if needed ALR R0,R8 high order: sum+=ASCBEJST * LM R8,R9,ASCBSRBT load ASCBSRBT ALR R1,R9 low order: sum+=ASCBSRBT BC 12,*+4+4 check for carry AL R0,=F'1' and correct if needed ALR R0,R8 high order: sum+=ASCBSRBT * LM R8,R9,LCCADTOD get final LCCADTOD C R9,SAVDTOD+4 check low order BNE CPUTIMN if ne, dispatch detected C R8,SAVDTOD check high order BE CPUTIME if eq, all fine * CPUTIMN BCT R10,CPUTIMR retry in case dispatch detected * CPUTIME <return handling> CPUTIM in register pair R0,R1 ... CKBUF DS 1D SAVDTOD DS 1D
Any comments on the function logic are very much appreciated, especially on
- is this solution portable also to current MVS versions ?
- is there still any hidden vulnerability which affects timing precision ?
- should the
SRBtime be included (like in the ALGOL example), or not (like in CERNLIB and CBT) ?