Reliable high-resolution CPU TIME retrieval under MVS 3.8J
The s370_perf
instruction time benchmark (see
project and
posting)
uses currently the elapsed time retrieved with STCK
for all
timing measurements. Mark L. Gaubatz commented rightfully that CPU time
should be used as the basis of all performance measurements (see
message, see
topic).
To implement this one needs a low overhead method to access the task CPU time with high resolution, best with microsecond precision.
s370_perf
is designed to run on MVS 3.8J
, as freely
available with the
Tur(n)key 4- System
distribution. The methods used for CPU time retrieval on modern MVS systems
are not available, most notably
TIMEUSED
macro is not available (was introduced later)TCBTTIME
field inTCB
is not available (was introduced later)
Tom Armstrong pointed me to a function CPUTIM
which is included
in the Algol F IVP code IEXSAMP4
and used for algorithm timing.
It essentially returns the sum of the ACB
fields
ASCBEJST
and ASCBSRBT
. The essential drawback is
that these fields are only updated after each dispatch, and a test quickly
showed that the CPU time retrieved this way has errors of up to 170 ms
(seen on a 2 CPU system with only one user job active).
When searching for other 'cpu time from ASCBEJST' methods I stumbled
across the TIMED
function included in the 1998 edition of the
KERNLIB
section of the CERN PACKLIB
(see
sourcecode).
This function calls the SVC CALLDISP
to force an update of
ASCBEJST
. Turns out that this works under MVS/SE
,
but not under MVS 3.8J
. A WAIT
on a fall-through
ECB
(see
posting)
also doesn't work under MVS 3.8J
.
A comment in the TIMED
function finally pointed me to the proper solution. The comment states effectively:
ASCBEJST + (TOD - LCCADTOD)
.
LCCADTOD
is an LCCA
field that holds the TOD
of the last dispatch. So I searched the whole MVS 3.8J
code base
for code using LCCADTOD
. Easy to do after a full export with the
hercexport tool (see posting on
hercexport and tk4- takeout) and a simple
find -name "*.mac" -type f -print0 | xargs -0 grep -l ",LCCADTOD"lists all PDS members with code accessing directly
LCCADTOD
.
The essential hits are
./cbt001.341/cbtcov.file185/cputime.mac
./cbt001.341/cbtcov.file185/cputim.mac
CPUTIM
function from a CBT
volume, one for FORTRAN and one for PL/I. The essential core of both
implementations is
STCK TOD LM R4,R5,TOD $SLD R4,LCCADTOD $ALD R4,ASCBEJSTwhere
$ALD
and $SLD
are macros for double-word
integer add/sub (available on the same CBT volume in PDS
cbtcov.file188
). A quick test immediately shows that this works
fine and returns a high-resolution CPU time.
However, the key trick of this method, that the time since the last dispatch
is added, causes also a vulnerability. If a dispatch occurs between the
subtract of LCCADTOD
and the add of ASCBEJST
the
CPU time spent during the previous dispatch is double-counted. Not very likely
to happen, but possible, especially on a single CPU system with some I/O load.
To protect against this vulnerability, I store the LCCADTOD
value
at the beginning of the function and check after all arithmetic whether the
LCCADTOD
value changed. If yes, a dispatch happened, and the
whole procedure is simply re-tried. Taken all together the core of my
CPUTIM
method looks like
L R6,PSALCCAV get LCCA ptr L R7,PSAAOLD get ASCB ptr LA R10,9 init retry loop count * CPUTIMR LM R8,R9,LCCADTOD get initial LCCADTOD STM R8,R9,SAVDTOD and save it * STCK CKBUF store TOD LM R0,R1,CKBUF SLR R1,R9 low order: sum=TOD-LCCADTOD BC 3,*+4+4 check for borrow SL R0,=F'1' and correct if needed SLR R0,R8 high order: sum=TOD-LCCADTOD * LM R8,R9,ASCBEJST load ASCBEJST ALR R1,R9 low order: sum+=ASCBEJST BC 12,*+4+4 check for carry AL R0,=F'1' and correct if needed ALR R0,R8 high order: sum+=ASCBEJST * LM R8,R9,ASCBSRBT load ASCBSRBT ALR R1,R9 low order: sum+=ASCBSRBT BC 12,*+4+4 check for carry AL R0,=F'1' and correct if needed ALR R0,R8 high order: sum+=ASCBSRBT * LM R8,R9,LCCADTOD get final LCCADTOD C R9,SAVDTOD+4 check low order BNE CPUTIMN if ne, dispatch detected C R8,SAVDTOD check high order BE CPUTIME if eq, all fine * CPUTIMN BCT R10,CPUTIMR retry in case dispatch detected * CPUTIME <return handling> CPUTIM in register pair R0,R1 ... CKBUF DS 1D SAVDTOD DS 1D
Any comments on the function logic are very much appreciated, especially on
- is this solution portable also to current MVS versions ?
- is there still any hidden vulnerability that affects timing precision ?
- should the
SRB
time be included (like in the ALGOL example), or not (like in CERNLIB and CBT) ?
A test code with the logic described above is available as part of s370_perf project, see source file and JCL template , to be used with the hercjis preprocessor.