The w11a microarchitecture
The microarchitecture of the w11a core is similar to the original 11/70 processor, the KB11-C CPU. Little parallelism and pipelining, just prefetch of the next instruction, and even that only when the current instruction is register destination. The goal was 'to get it working'. The w11a core (see pdp11_core.vhd) is composed of four main units:
- instruction decoder (see pdp11_decode.vhd),
- sequencer (see pdp11_sequencer.vhd),
- data path (see pdp11_dpath.vhd),
- MMU and the external bus and memory interface control (see pdp11_vmbox.vhd).
The main difference to the KB11-C is that the w11a is not a microcoded design but based on a large sequencer state machine with currently 113 states and 584 transitions. The number of states is significantly smaller than in the original 11/70 for two reasons: the FPP interface isn't implemented yet and the bus interface and memory handling is factored out into a separate vmbox state machine. The w11a main sequencer state flow graph is shown in Figure I-1 (also as svg and pdf). The symbols and color code used in this flow chart is explained in the document w11a_seq_flow.md.
The strict separation of the main state machine and data path helped a lot to control the logic path length. A large number of control signals between these entities were packed into VHDL records (see definitions in pdp11.vhd) so the port lists stayed at a reasonable length.
The Internal Bus - ibus
A very simple bus structure called ibus is used to connect the CPU with control registers within the processor, e.g. in the MMU, and the peripherals. The ibus is a simple synchronous single master - multiple slave bus. The control signals support the bus cycle types of a UNIBUS. To support the I/O emulation and the debug interface additional access modifiers were added to distinguish between 'CPU', 'console', and 'remote access' cycles. The ibus (see definitions in iblib.vhd) is implemented with two VHDL records, a master request and a slave response, giving very compact port maps. The slave responses are simply or'ed. The handling of an interrupt request and acknowledge is done with separate signals, the interrupt line to vector/priority mapping is done by an arbiter module (see ib_intmap.vhd). A system has one ibus per processor core, a multi-core system will have multiple ibus'es, like the PDP-11/70mP had a UNIBUS associated with each processor.
The Remote Register Interface - rbus and rlink
A second very simple bus structure called rbus is used to connect the
main components of a system to an external control entity. The rbus is again a
simple synchronous single master - multiple slave bus with an implementation
quite similar to the ibus (see definitions in
Via a rbus to w11a core 'control port' interface (see
the rbus master can control the CPU (start, stop, step etc) and access all
processor and devices registers as well as the main memory.
This interface provides also a 4096 words rbus to ibus or rb2ib window
which maps the ibus address space into the rbus address space. ibus accesses
via this window are marked as 'remote' accesses (via the
signal) and can be distinguished from CPU-originated accesses.
This is used to implement the I/O system emulation.
A single rbus can handle up to four CPUs with their attached peripherals
plus additional auxiliary units like the 'human I/O interceptor' used on
the Digilent boards
The rbus has a simple interrupt mechanism. A slave can ask for 'attention' by
LAM signal for a cycle (side note: LAM for
'look-at-me' is a retro pun on
an instrumentation standard very often used with a PDP-11 in DAQ systems).
The rbus can be used to build a direct 16 bit memory-mapped interface to
another processor. With the rlink protocol, the rbus can also be
controlled via any bi-directional byte stream communication channel.
In this case, the rlink-to-rbus bridge (see
is the local rbus master which can be remotely controlled from a
backend server via the communication channel. The rlink protocol
supports register reads and writes, block transfers, and a simple interrupt
handling via 'attention' messages. It is protected with a 18 bit CRC (see
to detect transmission errors. The implementation is based on an extended symbol
space with 261 symbols, 256 to represent the states of a data byte and 5
ATTN used as packet delimiters
and for other control purposes and was inspired by the usage of D- and
K-symbols in optical link protocols. The symbol space is mapped into an
8 bit data stream with a simple escaping mechanism (see
These 9 bit to 8 bit interface adapters provide a FIFO buffered byte-wide
rlink interface used in all systems
In the current systems, the rlink communication is done via
- a real or USB-emulated serial port (see rlink_sp1c.vhd). The UART (see serport_uart_rx.vhd and serport_uart_tx.vhd) supports baud rates up to 12 Mbaud. The baud rate is detected automatically by a BREAK + 0x80 sync character sequence (see serport_uart_autobaud.vhd). The FT2232H on Nexys4 and subsequent Digilent boards allows up to 12 MBaud. With the FT232R on the Nexys3 Baud rates of 2 MBaud are possible. With the onboard RS232 ports of the s3board and the Nexys2 boards, the maximal Baud rate is 460800 when a USB-RS232 adapter is directly connected. The rate is limited by the RS232 transceivers.
- a USB FIFO interface using the onboard Cypress FX2 USB controller on Nexys2 and Nexys3 boards. The FPGA interface (see fx2_2fifoctl_ic.vhd) provides two endpoints and a data transfer speed of up to 30 MByte/sec and a request-response rate of 4 kHz, limited by the USB 2.0 timing structure. The FX2 firmware supports besides data transfers to a running FPGA design also the configuration of the FPGA via JTAG (see sources and pre-build images).
The I/O System Emulation
All device I/O is emulated, as already briefly described under Features. A simple example is the transmit part of the unbuffered DL11 console interface (see ibdr_dl11.vhd). Sending a character to the emulated console involves the following steps:
- the CPU writes the character to the
xbuf(transmit buffer) register of the DL11.
- the controller logic clears the
xrdy(transmit ready) bit in the
csr(control status register) and asserts the
RB_LAMsignal for one cycle.
- this will set a bit in the attention mask of the rbus controller, and in
case the mask was clear before, trigger the sending of an
ATTNsymbol to the backend server.
- the backend server, upon reception of the
ATTNsymbol, will retrieve the attention mask, determine the source (or sources) of the attention requests, and start the appropriate handler (see Rw11CntlDL11.cpp).
- for a console transmit attention the backend server will issue a
remote read of the
xbufregister via rbus and the rb2ib window.
- the remote read of the
xbufwill return the character to the backend server, set the
xrdybit in the
csr, and in case interrupts were enabled, set the interrupt request flop in the controller.
For mass storage peripherals, the DMA transfers are emulated too. A simple example is disk access of the RK11 disk controller (see ibdr_rk11.vhd). Reading a disk block from a disk image handled by the backend server to the w11a memory involves the following steps:
- the CPU writes memory and disk address and transfer size information
into the appropriate RK11 controller registers (
- the CPU writes the I/O function code into 'control&status' register
rkcsand sets the
GObit to start the operation.
- the setting of the
GObit will cause an
RB_LAMand as described above the sending of an
ATTNsymbol to the backend server.
- the device handling routine in the backend server (see Rw11CntlRK11.cpp) will retrieve all relevant RK11 controller registers via rbus and the rb2ib window, read the disk block data from the disk image file, and transfer the data with rlink block transfer commands directly into the w11a memory.
- when all data is transferred, the backend server updates the RK11
registers via rbus and the rb2ib window. The last step is to clear the
GObit in the
- the clear
GObit concludes the I/O transaction, and in case interrupts were enabled, the interrupt request flop is set in the controller.
Even though conceptually straightforward, the devil is in the details.
Some of the device logic must be done in hardware to guarantee the timing
behavior expected by the original drivers. The 'remote' semantics of
some registers is therefore sometimes not just the read/write mirror image
of the 'local' semantics, like as the DL11 case, but quite different.
For example, in the RK11 controller, a write to the maintenance register
rkmr is essentially a noop when done from the CPU but used
for all kinds of control functions when done via rbus.
Porting to other FPGA Families
So far the w11a has only been used on Xilinx Spartan-3, Spartan-6 and Series-7 FPGAs, but it is expected that porting to other Xilinx FPGAs is easy. The only vendor-specific constructs are the I/O buffers and the memories, both encapsulated in memlib and xlib , respectively. The w11a uses distributed RAMs for the general-purpose register file (see pdp11_gpr.vhd) the SAR/SDR register file in the MMU (see pdp11_mmu_sadr.vhd) and small fifo's (see fifo_1c_dram_raw.vhd). The cache implementation (see pdp11_cache.vhd) uses dual-ported block RAMs in "READ_FIRST" mode to implement a 'speculative write' which can be undone in the next cycle.