Hercules instruction times from s370_perf - a first analysis and observations
A first fully analyzed instruction timing dataset for my
Intel Xeon E5-1620
reference system is now available under the
case id
2018-03-31_sys2
in the GitHub project
wfjm/s370-perf.
The page contains a
list of findings.
Upfront a proviso: there are significant deviations from a simple additive instruction timing model. See section additivity of instruction times.
Some findings simply show nicely how an emulator like Hercules works, e.g.
- branch to the same page is faster than to a different page, see section branch timing.
ALR
is faster thanAR
, see section ALR timing.CS
,CDS
, andTS
are slow in the lock missed case for multi-CPU setups, see section CS,CDS, TS performance.
Other key findings are
MVCIN
is quite slow, a factor 6 slower thanMVN
orMVZ
, see section MVCIN performance.CLCL
is a factor of 12 slower thanCLC
, see section CLCL performance.TRT
is factor 12 slower thanTR
, see section TRT performance.- speed of decimal arithmetic seems independent of digit count, except for
DP
, see section decimal performance.
The poor CLCL
performance, when compared to CLC
,
is a bit surprising because MVCL
shows roughly the same
performance as MVC
, so the overhead of an interruptible
instruction can't be the culprit.
Any remarks and comments are very welcome.
Data for many other systems are available now, see list of cases, but the full analysis will take some time.