Hercules instruction times from s370_perf - a first analysis and observations
A first fully analyzed instruction timing dataset for my
Intel Xeon E5-1620 reference system is now available under the
in the GitHub project
The page contains a
list of findings.
Up front a proviso: there are significant deviations from a simple additive instruction timing model. See section additivity of instruction times.
Some findings simply show nicely how an emulator like Hercules works, e.g.
- branch to same page is faster than to different page, see section branch timing.
ALRis faster than
AR, see section ALR timing.
TSslow in the lock missed case for multi-CPU setups, see section CS,CDS, TS performance.
Other key findings are
MVCINis quite slow, a factor 6 slower than
MVZ, see section MVCIN performance.
CLCLis factor of 12 slower than
CLC, see section CLCL performance.
TRTis factor 12 slower than
TR, see section TRT performance.
- speed of decimal arithmetic seems independent of digit count, except for
DP, see section decimal performance.
CLCL performance, when compared to
is a bit surprising because
MVCL shows roughly the same
MVC, so the overhead of an interruptible
instruction can't be the culprit.
Any remarks and comments are very welcome.
Data for many other systems is available now, see list of cases, but the full analysis will take some time.