Performance

Microarchitecture

Because the w11a microarchitecture is very similar to the original 11/70 processor, the KB11-C CPU, the instruction timing in clock cycles is also very similar. A register-register operation takes two clock cycles, a more involved case like an "add @r1,a(r2)" for example takes 12 cycles. Notable exceptions are the MUL (5 instead of 22 cycles) and DIV (23 instead of 46 cycles) instructions.

Clock Rate

On Spartan-3,6,7 or Artix-7 classs FPGA's the w11a systems run with a clock frequency of at least 50 MHz. Specifically:

  Familie     FPGA          Board       System        Clock    Comment
   Spartan-7  xc7s50-1      Arty S7     w11a_as7      80 MHz   MMCM (Viv 2019.1)
   Artix-7    xc7a35t-1     Cmod A7     w11a_c7       80 MHz   MMCM (Viv 2019.1)
   Artix-7    xc7a35t-1l    Arty A7     w11a_arty     72 MHz   MMCM (Viv 2019.1)
   Artix-7    xc7a35t-1     Basys3      w11a_b3       80 MHz   MMCM (Viv 2019.1)
   Artix-7    xc7a100t-1    Nexys A7    w11a_n4d      80 MHz   MMCM (Viv 2019.1)
   Artix-7    xc7a100t-1    Nexys4      w11a_n4       80 MHz   MMCM (Viv 2019.1)
   Spartan-6  xc6slx16-2    Nexys3      w11a_n3       64 MHz   DCM  (ISE 14.7)
   Spartan-3  xc3s1200e-4   Nexys2      w11a_n2       52 MHz   DCM
   Spartan-3  xc3s1000-4    S3board     w11a_s3       50 MHz   no DCM

Expected Performance

Compared to KB11-C CPU: The KB11-C CPU had a 150 ns micro cycle time. Both the w11a and the 11/70 have a cache, which greatly reduces the impact of memory latencies. So one expects that the w11a is about a factor 50/6.7 or 7.5 faster than the original PDP-11/70.

Compared to J11 CPU: This later ASIC implementation of the 11/70 ran with up to 20 MHz clock rate. It needed 4 clocks per microcycle, resulting in a 200 ns micro cycle time. However, the J11 had a significantly improved microarchitecture yielding up to a factor of two better cpi (cycles-per-instruction) value. So one expects that the w11a is at least a factor (50/20)*(4/1)*(1/2) or 5 faster than the fastest J11 based system, the PDP-11/93.

Benchmarks

The Dhrystone 2 and Tower of Hanoi benchmark codes taken from the 'BYTE UNIX Benchmark' were used to compare the w11a with real PDP-11's and other processors. The w11a values were determined for both boards, the comparison values obtained from Michael Schneider's benchmark collection:

  Type             OS        CPU/cache  (Mhz)   Dhry2   Hanoi   Dhry Hanoi  Dhry
                                                [lps]   [lps]   /MHz  /MHz  /Han

  w11a_s3 V0.5     BSD 2.11  w11a   8k   (50)   11510   160.8    230   3.2  71.6
  w11a_n2 V0.5     BSD 2.11  w11a   8k   (50)   11519   160.4    230   3.2  71.8
  w11a_n2 V0.51    BSD 2.11  w11a   8k   (58)   13218   186.1    228   3.2  71.0
  w11a_n4 V0.73    BSD 2.11  w11a  64k   (80)   18095   250.7    226   3.1  72.2

  pdp-11/53+       BSD 2.11  KDJ11-SD   (4.5)*    828    12.2    184   2.7  67.8
  Mac SE/30        A/UX      68030       (16)    3042    81.8    190   5.1  37.2
  SUN 3/60         NetBSD    68020       (20)    6934   121.3    346   6.1  57.3
  DECstation 2100  NetBSD    R2000       (12)   13206   155.5   1100  13.0  85.2
  NeXT N1100       NetBSD    68040       (25)   26882   386.1   1075  15.4  69.6
  HP 9000/433t     NetBSD    68040       (40)   55763   960.3   1394  24.0  58.1
  NCR system 3230  NetBSD    i486DX/2    (66)   63464   993.1    961  15.0  63.9
  NCR system 3230  NetBSD    i486DX/4   (100)   75010  1022.3    750  10.2  73.4

  Power Mac G4     Gentoo    PPC7455   (1400)    3713k   46.6k  2652  33.3  79.6
  Lenovo TS S10    Gentoo    i686 E8400(3000)   16464k  262.6k  5488  87.5  62.7

Note that the J11 system is listed with an effective microcycle rate of 4.5 MHz rather the chip clock rate of 18 MHz. This is also consistent with Bob Supnik's notes on the J11 where the J11 is classified as '4.5 MHz'. This gives a more meaningful value for the Dhry/MHz or 'Dhrystone per MHz' column. For a fair comparison, it is also important to remark that the PDP-11/53+ systems didn't have a cache and were therefore about a factor 2.3 slower than a PDP-11/93 with cache (see comparison , explaining the large factor between the w11a_s3 and the 11/53 benchmark results.

The Dhrystone, Tower of Hanoi, and 'syscall' benchmarks were also run on a simulated PDP-11 using SimH version V3.8-1 and natively on a Linux system. In both cases a Kubuntu 10.4 system with an Intel Core2 Duo E8400 CPU was used, cpufreg was fixed to 3 GHz.

  System       Platform              (MHz)  Dhry2    Hanoi  syscall  syscall  syscall
                                            [lps]    [lps]    [lps]   /Hanoi   /Dhry2

  2.11BSD      w11a_s3 V0.5          (50)   11510    160.8     7080    44.0    0.615
  2.11BSD      w11a_n2 V0.5          (50)   11519    160.4     6888    42.9    0.598
  2.11BSD      w11a_n2 V0.51         (58)   13218    186.1     7616    40.9    0.576
  2.11BSD      w11a_n4 V0.73         (80)   18095    250.7    10837    43.2    0.599

  2.11BSD      SimH on Intel E8400   (--)   17174    250.0    10713    42.9    0.623
  2.11BSD      SimH on Rasp Pi 2 B   (--)    4477     63.3     2651    41.9    0.592
  2.11BSD      SimH on Rasp Pi 3 b   (--)    7294    104.5     4217    41.3    0.578

  Ubuntu 10.4  Intel E8400         (3000)   10785k    74.1k    1020k   13.8    0.095

Some observations are:

The Nexys2 and Nexys3 boards have a significantly larger main memory latency than the S3BOARD. Because Dhry2 and Hanoi run almost completely in cache and do rarely writes they execute with equal speed. syscall is more sensitive to memory latencies, either due to more cache misses or delays from write-thru's.
The simulated PDP-11 on a vintage 2009 3 GHz PC is about as fast as the current FPGA implementation on an Artix-7. However, there is certainly room for improvement on the FPGA side, either with faster devices (e.g. a Kintext-7 instead of an Artix-7) and/or an improved microarchitecture (like J11 or even better).
Comparing the arithmetic benchmarks (Dhry2 and Hanoi) with the 'syscall' benchmark on SimH-simulated 2.11BSD and native Linux suggests that the system call overhead, normalized by processor speed, is larger for contemporary Linux than for 2.11BSD.

about	updated 2022-06-06
This is a private hobbyist website no impressum or privacy protection statement required see GitHub terms