Design

This section contains the design of the data path, the control unit, and other elements of the CPU.

The intention is to create a simple single-cycle design that conforms to the specification outlined above. References for the design include:

  • "2018 Patterson and Hennessy - Computer organisation and design: the hardware software interface (RISC-V edition)", which provides an introduction to single-cycle RISC-V CPU design in chapter 4

  • "2015 Li - Computer principles and design in verilog HDL", which provides a survey of practical techniques for programming general RISC CPUs in Verilog.

The design is intended for synthesis on Xilinx FPGAs. As a result, some design decisions are motivated by guidance in, e.g., Xilinx ultrafast design methodology.

Design Summary

The two main components of this single-cycle design are the data path and the control path:

  • The data path is responsible for the bulk of the calculations in each instruction cycle, and also stores the state of the processor (all the registers and memory). The data path is a sequential module, with all registers and memory updated on the rising clock edge. The data path will support all the RV32I and Zicsr instructions outlined in the specification, and has its behaviour controlled by a set of inputs from the control unit. It has the following interface:

    • Inputs: clock, control lines from the control unit (including exception information)

    • Outputs: fetched instruction, flags for when an instruction raises an exception

  • The control unit is a purely combinational module which takes a fetched instruction and decodes it into control lines for the data path. In addition, it reads any exception flags raised by the data path and modifies the control lines to trap the exception if necessary. It has the following interface:

    • Inputs: fetched instruction from the data path, exception flags from the data path

    • Outputs: the control lines for the data path.

In the normal execution of an instruction which does not raise an exception and is not interrupted, the order of operations is as follows:

  1. The data path combinationally fetches an instruction (based on the program counter which is a register in the data path)

  2. The fetched instruction is an input to the control unit, which combinationally decodes the instruction and configures the data path control lines

  3. The computations involved in executing the instruction in the data path are all combinational, so the result of the computation stabilises at the write inputs to all the registers and memory

  4. On the next rising clock edge, the results of the instruction are loaded into registers and memory in the data path

In an instruction that raises an exception, the order of operations is as follows:

  1. The data path combinationally fetches an instruction (based on the program counter which is a register in the data path)

  2. The fetched instruction is an input to the control unit, which combinationally decodes the instruction and configures the data path control lines

  3. The computations involved in executing the instruction load to an exception flag being raised (an output from the data path)

  4. The control unit reads the exception flag, and sets control lines to raise an exception trap. In doing so, none of the control lines that caused the exception to be raised are modified (otherwise the exception flag would not persist; this requirement is due to having an all-combinational computation). However, all lines that involves writes to integer registers, data memory, or CSRs should be de-asserted so that the instruction raising the exception does not complete

  5. On the next rising clock edge, the CPU state is modified so as to raise the exception (program counter set to exception vector, CSRs modified, etc.)

Interrupts are always handled "first", before executing an instruction. An interrupt is handled as follows:

  1. The data path checks interrupt conditions in parallel with fetching the instruction, and sets an interrupt flag (output) if an interrupt is pending

  2. The control unit reads the interrupt flag, and sets control lines to raise an interrupt trap, instead of decoding the instruction.

  3. On the next rising clock edge, the CPU state is modified so as to raise the interrupt (program counter set to interrupt vector, CSRs modified, etc.)

Data Path

Main ALU

The design will use a single ALU, which must support computational instructions, address calculations, and comparisons for branch operations. The structure of the RISC-V instructions means that it is possible to consistently route operands to the same input ports of the ALU. The computations required by the RV32I instructions are given below:

  • rs1_data OP rs2_data, for register-register and conditional branch instructions

  • rs1_data OP imm, for register-immediate, load/store, and jalr instructions

  • pc + imm, for jal and auipc

  • 0 + imm, for lui (could also bypass the ALU)

The jalr instruction also requires masking the ALU result using 0xffff_fffe before writing to rd. This is dealt with outside the ALU module.

The main ALU module will take the 32-bit immediate imm from an external immediate generation module, which is assumed to supply the correct immediate for the instruction type.

For Zicsr instructions, the following operands are required:

  • rs1_data OR csr_rdata for csrrs

  • imm OR csr_rdata for csrrsi

  • !rs1_data AND csr_rdata for csrrc

  • { 27{1’b1}, !imm[4:0] } AND csr_rdata for csrrci

The bitwise negation of the imm instruction only applies to the bottom 5 bits (uimm in CSR instructions); otherwise, csrrci could inadvertently clear high bits of the CSR (above bit 4).

In the formulas above, the order of operands represents how they will be mapped to the input ports of the ALU. The following pointers motivate the choice of operand order:

  • The rs1_data field is routed to port 1 of the ALU, and immediates are typically routed to port 2. This implies the ALU shift operation must use the first port for the value to be shifted, and use the second port for the shift amount.

  • For Zicsr instructions, the order of operands was chosen to fix the position of csr_rdata, and keep rs1_data on port 1, at the expense of having an immediate on port 1 (inconsistent with RV32I).

The ALU module is described below.

ALU Module

The ALU should be able to perform the following operations on its operands a and b, to produce result r:

  • addition: r = a + b

  • subtraction: r = a - b

  • and: r = a & b

  • or: r = a | b

  • xor: r = a ^ b

  • shift left: r = a << b

  • shift right (logical): r = a >> b

  • shift right (arithmetic): r = a >>> b

  • set if less than (unsigned): r = a < b (unsigned)? 1 : 0

  • set if less than (signed): r = a < b (signed)? 1 : 0

The only required flag is zero, for use by beq and bne instructions. Other conditional branch instructions can use r[0] with the operation set-if-less-than (signed/unsigned).

The signature for the alu module is shown below:

/// Arithmetic Control Unit
///
/// This is a purely combinational ALU implementation.
///
/// The operation depends on the 4-bit alu_op as
/// follows:
///
/// 0_000: r = a + b
/// 1_000: r = a - b
/// 0_001: r = a << b
/// x_010: r = a < b ? 1 : 0
/// x_011: r = signed(a) < signed(b) ? 1 : 0
/// x_100: r = a ^ b
/// 0_101: r = a >> b
/// 1_101: r = signed(a) >>> signed(b)
/// x_110: r = a | b
/// x_111: r = a & b
///
/// The separation in alu_op indicates that the top bit
/// comes form bit 30 of the instruction, and the bottom
/// 3 bits come from funct3, in R-type register-register
/// instructions.
///
/// For I-type register-immediate instructions, ensure
/// that the top bit is 0 for addi, slti, sltiu, xori
/// ori, and andi. For slli, srli, and srai, set the top
/// bit to bit 30 of the instruction, and set b to the
/// shift amount (shamt) field. Set the low three
/// bits to funct3 in all cases.
///
module alu(
    input [31:0] a, // First 32-bit operand
    input [31:0] b, // Second 32-bit operand
    input [3:0] alu_op, // ALU control signals (see comments above)
    output [31:0] r, // 32-bit result
    output zero // 1 if r is zero, 0 otherwise
    );

Main ALU Wrapper

A wrapper module is used to encapsulate the main ALU, and ensure inputs are mapped to the correct ports of the ALU consistently with the operation being implemented. The signature of the module is:

/// Main ALU Wrapper Module
///
/// This module routes input operands to the
/// main ALU depending on the instruction
/// being executed.
///
/// The arguments for the ALU are selected
/// by arg_sel as follows:
///
/// 000: rs1_data OP rs2_data
/// for register-register and conditional branch instructions
///
/// 001: rs1_data OP imm
/// for register-immediate, load/store, and jalr instructions
///
/// 010: pc + imm
/// for jal and auipc
///
/// 011: rs1_data OR csr_rdata
/// for csrrs
///
/// 100: imm OR csr_rdata
/// for csrrsi
///
/// 101: !rs1_data AND csr_rdata
/// for csrrc
///
/// 110: { 27{1'b1}, !imm[4:0] } AND csr_rdata
/// for csrrci
///
/// Whenever OP is used above, alu_op is used to
/// select the ALU operation following the comments
/// in the alu module.
///
/// Ensure that the imm input is consistent with the
/// operation being implemented (depending on the
/// instruction format).
///
/// In this design, the lui instruction bypasses the ALU.
module main_alu_wrapper(
       input [2:0] arg_sel, // Select the ALU arguments
       input [3:0] alu_op, // Select the ALU operation (when required)
       input [31:0] rs1_data, // Value of rs1 register
       input [31:0] rs2_data, // Value of rs2 register
       input [31:0] imm, // 32-bit immediate
       input [31:0] pc, // Current program counter
       input [31:0] csr_rdata, // Read-data for CSR bus
       output [31:0] main_alu_result, // ALU output
       output main_alu_zero // ALU zero flag output
       );

Immediate Generation

All immediates encoded in RISC-V instructions should be extended to 32-bit (mostly sign-extended, but zero-extended for Zicsr instructions. In addition, each instruction in RV32I or Zicsr only uses one immediate per instruction (either imm or uimm), meaning one module can decode this single immediate and expose it as one output imm. The module signature is as follows:

/// Extract an immediate encoded in the instruction
///
/// Each RV32I or Zicsr instruction contains at most
/// one immediate, which is extracted and converted to
/// a 32-bit format by this module. For Zicsr instructions,
/// the uimm field is also zero-extended to 32 bits, and
/// output using the same imm output.
///
/// The reference for how immediates are decoded is
/// v1_f2.4. The sel input picks the output as follows:
///
/// 000: { 21{instr[31]}, instr[30:20] }, I-type
/// 001: { 21{instr[31]}, {instr[30:25]}, instr[11:7] }, S-type
/// 010: { 20{instr[31]}, instr[7], instr[30:25], instr[11:8], 1'b0 }, B-type
/// 011: { instr[31:12], 12{1'b0} }, U-type
/// 100: { 12{instr[31]}, instr[19:12], instr[20], instr[30:21], 1'b0 }, J-type
///
/// 101: { 27{1'b0}, instr[24:20] }, Zicsr
///
module imm_gen(
       input [2:0] sel, // Set immediate to extract
       input [31:0] instr, // Current instruction
       output [31:0] imm // Output 32-bit immediate
       );

Register File

The register file is combinational with respect to reads (rs1 determines rs1_data, and rs2 determines rs2_data), and sequential for writes (rd_data is written to rd on the rising clock edge if write_en is set). The signature for the data path is as follows:

/// 32-bit Register file
///
/// There are 32 32-bit registers x0-x31, with x0 hardwired
/// to zero. This module provides two combinational output
/// ports, controlled by the two addresses rs1 and src, and
/// a single registered write (on the rising edge of the clock
/// when the write enable signal is asserted).
///
/// There is no reset; on power-on, the register values are
/// set to zero.
///
module register_file(
    input clk, // clock
    input write_en, // write enable for rd
	input [31:0] rd_data, // data for write
    input [4:0] rs1, // source register index
    input [4:0] rs2, // source register index
    input [4:0] rd, // destination register index for write
    output [31:0] rs1_data, // read port for rs1
    output [31:0] rs2_data // read port for rs2
    );

The fields rs1, rs2, and rd are routed from fixed locations in instr. The source for rd_data is selected from one of the following options:

  • main_alu_result for register-register, register-immediate, and auipc instructions

  • data_mem_rdata for load instructions

  • csr_rdata for Zicsr instructions

  • pc_plus_4 for unconditional jump instructions

  • imm for lui

To simplify the data path, the register file is wrapped in a module that routes the register indices from the instruction, and selects the source for writing data:

/// Write data for rd in register file
///
/// The rd_data_sel arguments selects between the inputs:
///
/// 000: main_alu_result,
/// for register-register, register-immediate, and auipc instructions
///
/// 001: data_mem_rdata
/// for load instructions
///
/// 010: csr_rdata
/// for Zicsr instruction
///
/// 011: pc_plus_4
/// for unconditional jump instructions
///
/// 100: { instr[31:12], 12{1'b0} } (from instr input)
/// for lui instruction
///
module register_file_wrapper(
	input clk, // for writing
	input write_en, // 1 to write data to rd; 0 otherwise
	input [1:0] rd_data_sel, // pick what to write to rd
	input [31:0] main_alu_result, // the output from the main ALU
	input [31:0] data_mem_rdata, // data output from data memory bus
	input [31:0] csr_rdata, // data output from CSR bus
	input [31:0] pc_plus_4, // current pc + 4, from pc module
	input [31:0] instr, // current instruction
    	output [31:0] rs1_data, // read port for rs1
    	output [31:0] rs2_data // read port for rs2
    );

Data Memory Bus

The design will use a simple bus for accesses to data memory (the target for loads and stores). Using a bus allows separate devices (e.g. main memory, and I/O devices) to be separated into different modules.

Instead of using an enable signal to pick which device is active on the bus, each device will determine whether it should handle the read or write, by setting a "claim" signal. The read outputs from all the devices are ORed together, and devices not claiming the signal set their output to zero.

The claim signals from all the devices are ORed together. On a read or write, this signal can be used to check that at least one device will handle the request. If no device will handle the request, a load/store access fault can be raised.

The (logical) bus interface is as follows:

interface data_mem_bus();
   bit        clk; // writes are performed on the rising clock edge
   bit [31:0] addr; // the read/write address
   bit [1:0]  width; // the width of the read/write (byte, halfword, word)
   bit [31:0] rdata; // read-data returned from device
   bit [31:0] wdata; // write-data passed to device
   bit	      write_en; // whether to perform a write (or just a read)
   bit	      claim; // devices will claim read/write based on address/width
endinterface

The OR logic for claim and rdata will be handled using modports, one per device (and one for the host).

The devices on the bus are:

  • The main memory module (RAM)

  • Any memory-mapped CSRs (in particular, the timer interrupt controller)

  • Any memory-mapped peripherals

The data memory bus is only used for loads and stores (instruction fetch does not use this bus in this design).

For both load and store instructions, the address is calculated by the ALU, so the addr line is hardwired to the main ALU output. The width field depends on the instruction, and is driven by the control unit. The write_en line is set only for store instructions. The wdata field is hardwired to rs2_data, which is the only source for writes to data memory.

The claim output is used by the control unit to potentially raise load/store access faults. The rdata output is hardwired to the register file wrapper, which is the only user of data memory bus data (load instructions).

CSR Bus

The CSR bus is similar to the data memory bus:

interface csr_bus();
   bit        clk; // writes are performed on the rising clock edge
   bit [11:0] addr; // the read/write address
   bit [31:0] rdata; // read-data returned from a CSR device
   bit [31:0] wdata; // data to be written to a CSR device
   bit	      write_en; // whether to perform a write (or just a read)
   bit	      claim; // devices will claim read/write based on address
endinterface

Exactly one CSR device attached to the bus will be responsible for asserting the claim signal, and either writing data or returning data. The other devices return zero on the rdata line. All the rdata lines for each device are ORed together to generate the bus rdata signal (and the same for the bus claim signal).

Only Zicsr instructions interact with the CSR bus. The addr input always comes from a fixed position in the instruction, and is hardwired there. The write_en input is set by the control unit. The data written back to the CSR comes from either rs1_data, the main ALU output, or the uimm field of the instruction (via the immediate generation module). The module selecting the resulting value for wdata is:

/// CSR write data source selection
///
/// Depending on the value of sel, the CSR write data
/// source is chosen as follows:
///
/// 00: rs1_data, for csrrw
/// 01: main_alu_result, for csrrs, csrrc, csrrsi, csrrci
/// 10: { imm }, for csrrwi
///
module csr_wdata_sel(
       input sel,
       input [31:0] rs1_data, // from the register file
       input [31:0] main_alu_result, // from the main ALU
       input [31:0] imm, // uimm, from immediate generator
       output [31:0] csr_wdata // to the CSR bus
       );

The output rdata from the CSR bus is routed to the register file wrapper (csr_rdata), for writing to rd, and is also routed to main ALU wrapper for use in computations that write back to the CSR. The csr_claim signal is returned to the control unit, to check for illegal instruction (on missing CSR).