The w11a microarchitecture
The microarchitecture of the w11a core is similar to the original 11/70 processor, the KB11-C CPU. Little parallelism and pipelining, just prefetch of the next instruction, and even that only when current instruction is register destination. The goal was 'to get it working'. The w11a core (see pdp11_core.vhd) is composed of four main units:
- instruction decoder (see pdp11_decode.vhd),
- sequencer (see pdp11_sequencer.vhd),
- data path (see pdp11_dpath.vhd),
- MMU and the external bus and memory interface control (see pdp11_vmbox.vhd).
Main difference to the KB11-C is that the w11a is not a micro coded design but based on a large sequencer state machine with currently 113 states and 584 transitions. The number of states is significantly smaller than in the original 11/70 for two reasons: the FPP interface isn't implemented yet and the bus interface and memory handling is factored out into a separate vmbox state machine. The w11a main sequencer state flow graph is shown in Figure I-1 (also as svg and pdf). The symbols and color code used in this flow chart is explained in the document w11a_seq_flow.md.
The strict separation of main state machine and data path helped a lot to control the logic path length. The large number of control signals between these entities were packed into VHDL records (see definitions in pdp11.vhd) so the port lists stayed at a resonable length.
The Internal Bus - ibus
A very simple bus structure called ibus is used to connect the CPU with control registers within the processor, e.g. in the MMU, and in the peripherals. The ibus is a simple synchronous single master - multiple slave bus. The control signals support the bus cycle types of a UNIBUS. To support the I/O emulation and the debug interface additional access modifiers were added to distinguish between 'CPU', 'console' and 'remote access' cycles. The ibus (see definitions in iblib.vhd) is implemented with two VHDL records, a master request and a slave response, giving very compact port maps. The slave responses are simply or'ed. The handling of interrupt request and acknowledge is done with separate signals, the interrupt line to vector/priority mapping is done by an arbiter module (see ib_intmap.vhd). A system has one ibus per processor core, a multicore system will have multiple ibus'es, like the PDP-11/74 had a UNIBUS associated with each processor.
The Remote Register Interface - rbus and rlink
A second very simple bus structure called rbus is used to connect the main components of a system to an external control entity. The rbus is again a simple synchronous single master - multiple slave bus with an implementation quite similar to the ibus (see definitions in rblib.vhd). Via an rbus to w11a core 'control port' interface (see pdp11_core_rbus.vhd) the rbus master can control the CPU (start, stop, step ect) and access all processor and devices registers as well as the main memory. This interface provides also a 32 words rbus to ibus or rb2ib window which maps a fixed part of the rbus address space into a selectable part of the ibus address space. ibus accesses via this window are marked as 'remote' accesses (via the racc signal) and can be distinguished from CPU originated accesses. This is used to implement the I/O system emulation.
A single rbus can handle up to four CPUs with their attached peripherals plus additional auxiliary units like the 'human I/O interceptor' used on the Digilent boards (see sn_humanio_rbus.vhd). The rbus has a simple interrupt mechanism. A slave can ask for 'attention' by asserting the LAM signal for a cycle (side note: LAM for 'look-at-me' is a retro pun on CAMAC, an instrumentation standard very often used with a PDP-11 in DAQ systems).
The rbus can be used to build a direct 16 bit memory mapped interface to another processor. With the rlink protocol the rbus can also be controlled via any bi-directional byte stream communication channel. In this case the rlink-to-rbus bridge (see rlink_core.vhd) is the local rbus master which can be remote controlled from a backend server via the communication channel. The rlink protocol supports register reads and writes, block transfers, and a simple interrupt handling via 'attention' messages. It is protected with a 18 bit CRC (see crc16.vhd) to detect transmission errors. The implementation is based on an extended symbol space with 261 symbols, 256 to represent the states of a data byte and 5 additional symbols IDLE, SOP, EOP, NAK and ATTN used as packet delimiters and for other control purposes and was inspired by the usage of D- and K-symbols in optical link protocols. The symbol space is mapped into a 8 bit datastream with a simple escaping mechanism (see cdata2byte.vhd and byte2cdata.vhd). These 9 bit to 8 bit interface adapters provide a fifo buffered byte wide rlink interface used in all systems (see also rlink_core8.vhd).
In the current systems the rlink communication is done via
- an USB FIFO interface using the onboard Cypress FX2 USB controller on Nexys2 and Nexys3 boards. The FPGA interface (see fx2_2fifoctl_ic.vhd) provides two end points and a data transfer speed of up to 30 MByte/sec and a request-response rate of 4 kHz, limited by the USB 2.0 timing structure. The FX2 firmware supports besides data transfers to a running FPGA design also the configuration of the FPGA via JTAG (see sources and pre-build images).
- a real or USB-emulated serial port (see rlink_sp1c.vhd). The UART (see serport_uart_rx.vhd and serport_uart_tx.vhd) supports baud rates up to 3 Mbaud. The baud rate is detected automatically by a BREAK + 0x80 sync character sequence (see serport_uart_autobaud.vhd). With the onboard RS232 ports of the s3board and the Nexys2 boards the maximal Baud rate is 460800 when a USB-RS232 adapter is directly connected. The rate is limited by the RS232 transceivers. With on-board USB-UARTs like the FT232R on the Nexys3 Baud rates of 2 MBaud are possible. The FT2232 on Nexys4 allows up to 12 MBaud (in w11a designs 10 MBaud possible).
The I/O System Emulation
All device I/O is emulated, as already described under Features. A simple example is the transmit part of the DL11 console interface (see ibdr_dl11.vhd). Sending a character to the emulated console involves the following steps:
- the CPU writes the character to the xbuf (transmit buffer) register of the DL11.
- the controller logic clears the xrdy (transmit ready) bit in the csr (control status register) and asserts the RB_LAM signal for one cycle.
- this will set a bit in the attention mask of the rbus controller, and in case the mask was clear before, trigger the sending of an 'attn' symbol to the backend server.
- the backend server, upon reception of the 'attn' symbol, will retrieve the attention mask, determine the source (or sources) of the attention requests, and start the appropriate handling.
- for a console transmit attention the backend server will issue a remote read of the xbuf register via rbus and the rb2ib window.
- the remote read of the xbuf will return the character to the backend server, set the xrdy bit in the csr, and in case interrupts were enabled, set the interrupt request flop in the controller.
For mass storage peripherals the DMA transfers are emulated too. A simple example is a disk access of the RK11 disk controller (see ibdr_rk11.vhd). Reading a disk block from a disk image handled by the backend server to the w11a memory involves the following steps:
- the CPU writes memory and disk address and transfer size information into the appropriate RK11 controller registers (rkwc, rkba, rkda).
- the CPU writes the I/O function code into 'control&status' register rkcs and sets the 'GO' bit to start the operation.
- the setting of the 'GO' bit will cause a RB_LAM and as described above the sending of an 'attn' symbol to the backend server.
- the device handling routine in the backend server will retrieve all relevant RK11 controller registers via rbus and the rb2ib window, read the disk block data from the disk image file, and transfer the data with rlink block transfer commands directly into the w11a memory.
- when all data is transfered, the backend server updates the RK11 registers via rbus and the rb2ib window. The last step is to clear the 'GO' bit in the rkcs.
- the clear 'GO' bit concludes the I/O transaction, and in case interrupts were enabled, the interrupt request flop is set in the controller.
Even though conceptually straight forward, the devil is in the details. Some of the device logic must be done in hardware to guarantee the timing behavior expected by the original drivers. The 'remote' semantics of some registers is therefore sometime not just the read/write mirror image of the 'local' semantics, like as the DL11 case, but quite different. For example, in the RK11 controller a write to the maintenance register rkmr is essentially a noop when done from the CPU but used for all kinds of control functions when done via rbus.
Porting to other FPGA Families
So far the w11a has only been used on Xilinx Spartan-3 and Spartan-6 type FPGAs, but it is expected that porting to other Xilinx FPGAs like Virtex-6 or the Series-7 is easy. The only vendor specific constructs are the I/O buffers and the memories, both encapsulated in memlib and xlib , respectively. The w11a uses distributed RAMs for the general purpose register file (see pdp11_gpr.vhd) the SAR/SDR register file in the MMU (see pdp11_mmu_sadr.vhd) and small fifo's (see fifo_1c_dram_raw.vhd). The cache implementation (see pdp11_cache.vhd) uses dual ported block RAMs in "READ_FIRST" mode to implement a 'speculative write' which can be undone in the next cycle.