Fig I-1: w11a main sequencer state flow. Description of symbol and color code see annotation. Also available as svg and pdf. (wfjm cc-by-4.0)

The w11a microarchitecture

The microarchitecture of the w11a core is similar to the original 11/70 processor, the KB11-C CPU. Little parallelism and pipelining, just prefetch of the next instruction, and even that only when the current instruction is register destination. The goal was 'to get it working'. The w11a core (see pdp11_core.vhd) is composed of four main units:

The main difference to the KB11-C is that the w11a is not a microcoded design but based on a large sequencer state machine with currently 113 states and 584 transitions. The number of states is significantly smaller than in the original 11/70 for two reasons: the FPP interface isn't implemented yet and the bus interface and memory handling is factored out into a separate vmbox state machine. The w11a main sequencer state flow graph is shown in Figure I-1 (also as svg and pdf). The symbols and color code used in this flow chart is explained in the document

The strict separation of the main state machine and data path helped a lot to control the logic path length. A large number of control signals between these entities were packed into VHDL records (see definitions in pdp11.vhd) so the port lists stayed at a reasonable length.

The Internal Bus - ibus

A very simple bus structure called ibus is used to connect the CPU with control registers within the processor, e.g. in the MMU, and the peripherals. The ibus is a simple synchronous single master - multiple slave bus. The control signals support the bus cycle types of a UNIBUS. To support the I/O emulation and the debug interface additional access modifiers were added to distinguish between 'CPU', 'console', and 'remote access' cycles. The ibus (see definitions in iblib.vhd) is implemented with two VHDL records, a master request and a slave response, giving very compact port maps. The slave responses are simply or'ed. The handling of an interrupt request and acknowledge is done with separate signals, the interrupt line to vector/priority mapping is done by an arbiter module (see ib_intmap.vhd). A system has one ibus per processor core, a multi-core system will have multiple ibus'es, like the PDP-11/70mP had a UNIBUS associated with each processor.

The Remote Register Interface - rbus and rlink

A second very simple bus structure called rbus is used to connect the main components of a system to an external control entity. The rbus is again a simple synchronous single master - multiple slave bus with an implementation quite similar to the ibus (see definitions in rblib.vhd). Via a rbus to w11a core 'control port' interface (see pdp11_core_rbus.vhd) the rbus master can control the CPU (start, stop, step etc) and access all processor and devices registers as well as the main memory. This interface provides also a 4096 words rbus to ibus or rb2ib window which maps the ibus address space into the rbus address space. ibus accesses via this window are marked as 'remote' accesses (via the racc signal) and can be distinguished from CPU-originated accesses. This is used to implement the I/O system emulation.

A single rbus can handle up to four CPUs with their attached peripherals plus additional auxiliary units like the 'human I/O interceptor' used on the Digilent boards (see sn_humanio_rbus.vhd). The rbus has a simple interrupt mechanism. A slave can ask for 'attention' by asserting the LAM signal for a cycle (side note: LAM for 'look-at-me' is a retro pun on CAMAC, an instrumentation standard very often used with a PDP-11 in DAQ systems).

The rbus can be used to build a direct 16 bit memory-mapped interface to another processor. With the rlink protocol, the rbus can also be controlled via any bi-directional byte stream communication channel. In this case, the rlink-to-rbus bridge (see rlink_core.vhd) is the local rbus master which can be remotely controlled from a backend server via the communication channel. The rlink protocol supports register reads and writes, block transfers, and a simple interrupt handling via 'attention' messages. It is protected with a 18 bit CRC (see crc16.vhd) to detect transmission errors. The implementation is based on an extended symbol space with 261 symbols, 256 to represent the states of a data byte and 5 additional symbols IDLE, SOP, EOP, NAK, and ATTN used as packet delimiters and for other control purposes and was inspired by the usage of D- and K-symbols in optical link protocols. The symbol space is mapped into an 8 bit data stream with a simple escaping mechanism (see cdata2byte.vhd and byte2cdata.vhd). These 9 bit to 8 bit interface adapters provide a FIFO buffered byte-wide rlink interface used in all systems (see also rlink_core8.vhd).

In the current systems, the rlink communication is done via

The I/O System Emulation

All device I/O is emulated, as already briefly described under Features. A simple example is the transmit part of the unbuffered DL11 console interface (see ibdr_dl11.vhd). Sending a character to the emulated console involves the following steps:

The DL11 receive logic is conceptually similar. The logic of the LP11 line printer interface (see ibdr_lp11.vhd) and the PC11 paper tape reader/puncher interface (see ibdr_pc11.vhd) is analogous.

For mass storage peripherals, the DMA transfers are emulated too. A simple example is disk access of the RK11 disk controller (see ibdr_rk11.vhd). Reading a disk block from a disk image handled by the backend server to the w11a memory involves the following steps:

Even though conceptually straightforward, the devil is in the details. Some of the device logic must be done in hardware to guarantee the timing behavior expected by the original drivers. The 'remote' semantics of some registers is therefore sometimes not just the read/write mirror image of the 'local' semantics, like as the DL11 case, but quite different. For example, in the RK11 controller, a write to the maintenance register rkmr is essentially a noop when done from the CPU but used for all kinds of control functions when done via rbus.

Porting to other FPGA Families

So far the w11a has only been used on Xilinx Spartan-3, Spartan-6 and Series-7 FPGAs, but it is expected that porting to other Xilinx FPGAs is easy. The only vendor-specific constructs are the I/O buffers and the memories, both encapsulated in memlib and xlib , respectively. The w11a uses distributed RAMs for the general-purpose register file (see pdp11_gpr.vhd) the SAR/SDR register file in the MMU (see pdp11_mmu_sadr.vhd) and small fifo's (see fifo_1c_dram_raw.vhd). The cache implementation (see pdp11_cache.vhd) uses dual-ported block RAMs in "READ_FIRST" mode to implement a 'speculative write' which can be undone in the next cycle.