Design
This section contains the design of the data path, the control unit, and other elements of the CPU.
The intention is to create a simple single-cycle design that conforms to the specification outlined above. References for the design include:
-
"2018 Patterson and Hennessy - Computer organisation and design: the hardware software interface (RISC-V edition)", which provides an introduction to single-cycle RISC-V CPU design in chapter 4
-
"2015 Li - Computer principles and design in verilog HDL", which provides a survey of practical techniques for programming general RISC CPUs in Verilog.
The design is intended for synthesis on Xilinx FPGAs. As a result, some design decisions are motivated by guidance in, e.g., Xilinx ultrafast design methodology.
Design Summary
The two main components of this single-cycle design are the data path and the control path:
-
The data path is responsible for the bulk of the calculations in each instruction cycle, and also stores the state of the processor (all the registers and memory). The data path is a sequential module, with all registers and memory updated on the rising clock edge. The data path will support all the RV32I and Zicsr instructions outlined in the specification, and has its behaviour controlled by a set of inputs from the control unit. It has the following interface:
-
Inputs: clock, control lines from the control unit (including exception information)
-
Outputs: fetched instruction, flags for when an instruction raises an exception
-
-
The control unit is a purely combinational module which takes a fetched instruction and decodes it into control lines for the data path. In addition, it reads any exception flags raised by the data path and modifies the control lines to trap the exception if necessary. It has the following interface:
-
Inputs: fetched instruction from the data path, exception flags from the data path
-
Outputs: the control lines for the data path.
-
In the normal execution of an instruction which does not raise an exception and is not interrupted, the order of operations is as follows:
-
The data path combinationally fetches an instruction (based on the program counter which is a register in the data path)
-
The fetched instruction is an input to the control unit, which combinationally decodes the instruction and configures the data path control lines
-
The computations involved in executing the instruction in the data path are all combinational, so the result of the computation stabilises at the write inputs to all the registers and memory
-
On the next rising clock edge, the results of the instruction are loaded into registers and memory in the data path
In an instruction that raises an exception, the order of operations is as follows:
-
The data path combinationally fetches an instruction (based on the program counter which is a register in the data path)
-
The fetched instruction is an input to the control unit, which combinationally decodes the instruction and configures the data path control lines
-
The computations involved in executing the instruction load to an exception flag being raised (an output from the data path)
-
The control unit reads the exception flag, and sets control lines to raise an exception trap. In doing so, none of the control lines that caused the exception to be raised are modified (otherwise the exception flag would not persist; this requirement is due to having an all-combinational computation). However, all lines that involves writes to integer registers, data memory, or CSRs should be de-asserted so that the instruction raising the exception does not complete
-
On the next rising clock edge, the CPU state is modified so as to raise the exception (program counter set to exception vector, CSRs modified, etc.)
Interrupts are always handled "first", before executing an instruction. An interrupt is handled as follows:
-
The data path checks interrupt conditions in parallel with fetching the instruction, and sets an interrupt flag (output) if an interrupt is pending
-
The control unit reads the interrupt flag, and sets control lines to raise an interrupt trap, instead of decoding the instruction.
-
On the next rising clock edge, the CPU state is modified so as to raise the interrupt (program counter set to interrupt vector, CSRs modified, etc.)
Data Path
Main ALU
The design will use a single ALU, which must support computational instructions, address calculations, and comparisons for branch operations. The structure of the RISC-V instructions means that it is possible to consistently route operands to the same input ports of the ALU. The computations required by the RV32I instructions are given below:
-
rs1_data OP rs2_data
, for register-register and conditional branch instructions -
rs1_data OP imm
, for register-immediate, load/store, andjalr
instructions -
pc + imm
, forjal
andauipc
-
0 + imm
, forlui
(could also bypass the ALU)
The jalr instruction also requires masking the ALU result using 0xffff_fffe before writing to rd . This is dealt with outside the ALU module.
|
The main ALU module will take the 32-bit immediate imm
from an external immediate generation module, which is assumed to supply the correct immediate for the instruction type.
For Zicsr instructions, the following operands are required:
-
rs1_data OR csr_rdata
forcsrrs
-
imm OR csr_rdata
forcsrrsi
-
!rs1_data AND csr_rdata
forcsrrc
-
{ 27{1’b1}, !imm[4:0] } AND csr_rdata
forcsrrci
The bitwise negation of the imm instruction only applies to the bottom 5 bits (uimm in CSR instructions); otherwise, csrrci could inadvertently clear high bits of the CSR (above bit 4).
|
In the formulas above, the order of operands represents how they will be mapped to the input ports of the ALU. The following pointers motivate the choice of operand order:
-
The
rs1_data
field is routed to port 1 of the ALU, and immediates are typically routed to port 2. This implies the ALU shift operation must use the first port for the value to be shifted, and use the second port for the shift amount. -
For Zicsr instructions, the order of operands was chosen to fix the position of
csr_rdata
, and keeprs1_data
on port 1, at the expense of having an immediate on port 1 (inconsistent with RV32I).
The ALU module is described below.
ALU Module
The ALU should be able to perform the following operations on its operands a
and b
, to produce result r
:
-
addition:
r = a + b
-
subtraction:
r = a - b
-
and:
r = a & b
-
or:
r = a | b
-
xor:
r = a ^ b
-
shift left:
r = a << b
-
shift right (logical):
r = a >> b
-
shift right (arithmetic):
r = a >>> b
-
set if less than (unsigned):
r = a < b (unsigned)? 1 : 0
-
set if less than (signed):
r = a < b (signed)? 1 : 0
The only required flag is zero
, for use by beq
and bne
instructions. Other conditional branch instructions can use r[0]
with the operation set-if-less-than (signed/unsigned).
The signature for the alu
module is shown below:
/// Arithmetic Control Unit
///
/// This is a purely combinational ALU implementation.
///
/// The operation depends on the 4-bit alu_op as
/// follows:
///
/// 0_000: r = a + b
/// 1_000: r = a - b
/// 0_001: r = a << b
/// x_010: r = a < b ? 1 : 0
/// x_011: r = signed(a) < signed(b) ? 1 : 0
/// x_100: r = a ^ b
/// 0_101: r = a >> b
/// 1_101: r = signed(a) >>> signed(b)
/// x_110: r = a | b
/// x_111: r = a & b
///
/// The separation in alu_op indicates that the top bit
/// comes form bit 30 of the instruction, and the bottom
/// 3 bits come from funct3, in R-type register-register
/// instructions.
///
/// For I-type register-immediate instructions, ensure
/// that the top bit is 0 for addi, slti, sltiu, xori
/// ori, and andi. For slli, srli, and srai, set the top
/// bit to bit 30 of the instruction, and set b to the
/// shift amount (shamt) field. Set the low three
/// bits to funct3 in all cases.
///
module alu(
input [31:0] a, // First 32-bit operand
input [31:0] b, // Second 32-bit operand
input [3:0] alu_op, // ALU control signals (see comments above)
output [31:0] r, // 32-bit result
output zero // 1 if r is zero, 0 otherwise
);
Main ALU Wrapper
A wrapper module is used to encapsulate the main ALU, and ensure inputs are mapped to the correct ports of the ALU consistently with the operation being implemented. The signature of the module is:
/// Main ALU Wrapper Module
///
/// This module routes input operands to the
/// main ALU depending on the instruction
/// being executed.
///
/// The arguments for the ALU are selected
/// by arg_sel as follows:
///
/// 000: rs1_data OP rs2_data
/// for register-register and conditional branch instructions
///
/// 001: rs1_data OP imm
/// for register-immediate, load/store, and jalr instructions
///
/// 010: pc + imm
/// for jal and auipc
///
/// 011: rs1_data OR csr_rdata
/// for csrrs
///
/// 100: imm OR csr_rdata
/// for csrrsi
///
/// 101: !rs1_data AND csr_rdata
/// for csrrc
///
/// 110: { 27{1'b1}, !imm[4:0] } AND csr_rdata
/// for csrrci
///
/// Whenever OP is used above, alu_op is used to
/// select the ALU operation following the comments
/// in the alu module.
///
/// Ensure that the imm input is consistent with the
/// operation being implemented (depending on the
/// instruction format).
///
/// In this design, the lui instruction bypasses the ALU.
module main_alu_wrapper(
input [2:0] arg_sel, // Select the ALU arguments
input [3:0] alu_op, // Select the ALU operation (when required)
input [31:0] rs1_data, // Value of rs1 register
input [31:0] rs2_data, // Value of rs2 register
input [31:0] imm, // 32-bit immediate
input [31:0] pc, // Current program counter
input [31:0] csr_rdata, // Read-data for CSR bus
output [31:0] main_alu_result, // ALU output
output main_alu_zero // ALU zero flag output
);
Immediate Generation
All immediates encoded in RISC-V instructions should be extended to 32-bit (mostly sign-extended, but zero-extended for Zicsr instructions. In addition, each instruction in RV32I or Zicsr only uses one immediate per instruction (either imm
or uimm
), meaning one module can decode this single immediate and expose it as one output imm
. The module signature is as follows:
/// Extract an immediate encoded in the instruction
///
/// Each RV32I or Zicsr instruction contains at most
/// one immediate, which is extracted and converted to
/// a 32-bit format by this module. For Zicsr instructions,
/// the uimm field is also zero-extended to 32 bits, and
/// output using the same imm output.
///
/// The reference for how immediates are decoded is
/// v1_f2.4. The sel input picks the output as follows:
///
/// 000: { 21{instr[31]}, instr[30:20] }, I-type
/// 001: { 21{instr[31]}, {instr[30:25]}, instr[11:7] }, S-type
/// 010: { 20{instr[31]}, instr[7], instr[30:25], instr[11:8], 1'b0 }, B-type
/// 011: { instr[31:12], 12{1'b0} }, U-type
/// 100: { 12{instr[31]}, instr[19:12], instr[20], instr[30:21], 1'b0 }, J-type
///
/// 101: { 27{1'b0}, instr[24:20] }, Zicsr
///
module imm_gen(
input [2:0] sel, // Set immediate to extract
input [31:0] instr, // Current instruction
output [31:0] imm // Output 32-bit immediate
);
Register File
The register file is combinational with respect to reads (rs1
determines rs1_data
, and rs2
determines rs2_data
), and sequential for writes (rd_data
is written to rd
on the rising clock edge if write_en
is set). The signature for the data path is as follows:
/// 32-bit Register file
///
/// There are 32 32-bit registers x0-x31, with x0 hardwired
/// to zero. This module provides two combinational output
/// ports, controlled by the two addresses rs1 and src, and
/// a single registered write (on the rising edge of the clock
/// when the write enable signal is asserted).
///
/// There is no reset; on power-on, the register values are
/// set to zero.
///
module register_file(
input clk, // clock
input write_en, // write enable for rd
input [31:0] rd_data, // data for write
input [4:0] rs1, // source register index
input [4:0] rs2, // source register index
input [4:0] rd, // destination register index for write
output [31:0] rs1_data, // read port for rs1
output [31:0] rs2_data // read port for rs2
);
The fields rs1
, rs2
, and rd
are routed from fixed locations in instr
. The source for rd_data
is selected from one of the following options:
-
main_alu_result
for register-register, register-immediate, andauipc
instructions -
data_mem_rdata
for load instructions -
csr_rdata
for Zicsr instructions -
pc_plus_4
for unconditional jump instructions -
imm
forlui
To simplify the data path, the register file is wrapped in a module that routes the register indices from the instruction, and selects the source for writing data:
/// Write data for rd in register file
///
/// The rd_data_sel arguments selects between the inputs:
///
/// 000: main_alu_result,
/// for register-register, register-immediate, and auipc instructions
///
/// 001: data_mem_rdata
/// for load instructions
///
/// 010: csr_rdata
/// for Zicsr instruction
///
/// 011: pc_plus_4
/// for unconditional jump instructions
///
/// 100: { instr[31:12], 12{1'b0} } (from instr input)
/// for lui instruction
///
module register_file_wrapper(
input clk, // for writing
input write_en, // 1 to write data to rd; 0 otherwise
input [1:0] rd_data_sel, // pick what to write to rd
input [31:0] main_alu_result, // the output from the main ALU
input [31:0] data_mem_rdata, // data output from data memory bus
input [31:0] csr_rdata, // data output from CSR bus
input [31:0] pc_plus_4, // current pc + 4, from pc module
input [31:0] instr, // current instruction
output [31:0] rs1_data, // read port for rs1
output [31:0] rs2_data // read port for rs2
);
Data Memory Bus
The design will use a simple bus for accesses to data memory (the target for loads and stores). Using a bus allows separate devices (e.g. main memory, and I/O devices) to be separated into different modules.
Instead of using an enable signal to pick which device is active on the bus, each device will determine whether it should handle the read or write, by setting a "claim" signal. The read outputs from all the devices are ORed together, and devices not claiming the signal set their output to zero.
The claim signals from all the devices are ORed together. On a read or write, this signal can be used to check that at least one device will handle the request. If no device will handle the request, a load/store access fault can be raised.
The (logical) bus interface is as follows:
interface data_mem_bus();
bit clk; // writes are performed on the rising clock edge
bit [31:0] addr; // the read/write address
bit [1:0] width; // the width of the read/write (byte, halfword, word)
bit [31:0] rdata; // read-data returned from device
bit [31:0] wdata; // write-data passed to device
bit write_en; // whether to perform a write (or just a read)
bit claim; // devices will claim read/write based on address/width
endinterface
The OR logic for claim
and rdata
will be handled using modports, one per device (and one for the host).
The devices on the bus are:
-
The main memory module (RAM)
-
Any memory-mapped CSRs (in particular, the timer interrupt controller)
-
Any memory-mapped peripherals
The data memory bus is only used for loads and stores (instruction fetch does not use this bus in this design).
For both load and store instructions, the address is calculated by the ALU, so the addr
line is hardwired to the main ALU output. The width
field depends on the instruction, and is driven by the control unit. The write_en
line is set only for store instructions. The wdata
field is hardwired to rs2_data
, which is the only source for writes to data memory.
The claim
output is used by the control unit to potentially raise load/store access faults. The rdata
output is hardwired to the register file wrapper, which is the only user of data memory bus data (load instructions).
CSR Bus
The CSR bus is similar to the data memory bus:
interface csr_bus();
bit clk; // writes are performed on the rising clock edge
bit [11:0] addr; // the read/write address
bit [31:0] rdata; // read-data returned from a CSR device
bit [31:0] wdata; // data to be written to a CSR device
bit write_en; // whether to perform a write (or just a read)
bit claim; // devices will claim read/write based on address
endinterface
Exactly one CSR device attached to the bus will be responsible for asserting the claim signal, and either writing data or returning data. The other devices return zero on the rdata
line. All the rdata
lines for each device are ORed together to generate the bus rdata
signal (and the same for the bus claim
signal).
Only Zicsr instructions interact with the CSR bus. The addr
input always comes from a fixed position in the instruction, and is hardwired there. The write_en
input is set by the control unit. The data written back to the CSR comes from either rs1_data
, the main ALU output, or the uimm
field of the instruction (via the immediate generation module). The module selecting the resulting value for wdata
is:
/// CSR write data source selection
///
/// Depending on the value of sel, the CSR write data
/// source is chosen as follows:
///
/// 00: rs1_data, for csrrw
/// 01: main_alu_result, for csrrs, csrrc, csrrsi, csrrci
/// 10: { imm }, for csrrwi
///
module csr_wdata_sel(
input sel,
input [31:0] rs1_data, // from the register file
input [31:0] main_alu_result, // from the main ALU
input [31:0] imm, // uimm, from immediate generator
output [31:0] csr_wdata // to the CSR bus
);
The output rdata
from the CSR bus is routed to the register file wrapper (csr_rdata
), for writing to rd
, and is also routed to main ALU wrapper for use in computations that write back to the CSR. The csr_claim
signal is returned to the control unit, to check for illegal instruction (on missing CSR).