Lab 02 - Single Cylce RV Processor
Introduction
In this lab, we will build a single-cycle RISC-V processor and the lab manual is here. And our processor will support the followig instructions for now,
add,addi,sub,and,andi,or,orilw,swbeq,bne,jal(without linking, that is, without saving the return address).lui,auipcsll,srl,sra
Design Files
Top Module
Our Top design file, Top_Nexys.vhd, is written in VHDL instead of Verilog. (FYI, some of our teaching team is really a fan of VHDL 😂). Before reading about this section, I strongly recommended you to read the our Wrapper file first.
Purpose of Top
In this module, we explicitly connect the ports (like LED, Seven-seg, etc) on our Wrapper (or, motherboard) to the real ports (LEC, Seven-seg, etc) on our Nexys 4 FPGA.
CLK_DIV_BITS
You don't need to know how everything works in Top_Nexys.vhd except for one variable — CLK_DIV_BITS, which is essentially a variable to slow down the processor's speed.
For example, given that the base clock frequency on our FPGA is 100MHz.
If
CLK_DIV_BITS = 5, then the processr's clock frequency will be .Or if
CLK_DIV_BITS = 2, then the processor's clock frequency will be .
In summary, the relationship between CLK_DIV_BITS and clock frequency is
Why does the Wrapper exist?
Wrapper.v is long, but it’s basically a simulation harness that sits between our RISC-V core (RV) and the outside world (testbench or FPGA board). It provides memories and memory-mapped peripherals so that the processor can interact with LEDs, DIP switches, buttons, UART, OLED, etc. We can think of it as the “motherboard” that our RV CPU plugs into.
In short, Wrapper.v is just the Verilog implementation of our Lec 03 microarchitecture but adding the MMIO part. (Go to the spoiler at the back)
Purpose of Wrapper
Provides IROM (instruction memory) and DMEM (data memory) to the processor.
Provides MMIO (memory-mapped I/O) registers for peripherals like LEDs, UART, DIP switches, OLED, accelerometer, etc.
Translates “clean, parallel” signals into abstract peripherals that are easy to monitor in simulation.
Used only for simulation (not synthesis directly). On the FPGA, the higher-level
TOP_Nexys.vhdconnects the wrapper to actual hardware pins.
So, it makes our RISC-V behave like it’s running on a small SoC with peripherals.
I/O Ports
DIP
In
16
DIP switch inputs (not debounced).
PB
In
3
Push buttons (BTNL, BTNC, BTNR; not debounced).
LED_OUT
Out (reg)
8
LEDs [7:0] showing processor results.
LED_PC
Out
7
Shows PC[8:2] on LEDs [15:9].
SEVENSEGHEX
Out (reg)
32
Data for 8-digit 7-seg display.
UART_TX
Out (reg)
8
Byte sent to PC/testbench via UART.
UART_TX_ready
In
1
UART ready flag (ok to write new TX byte).
UART_TX_valid
Out (reg)
1
Indicates Wrapper wrote a new UART TX byte.
UART_RX
In
8
Byte received from PC/testbench via UART.
UART_RX_valid
In
1
Indicates new RX data available.
UART_RX_ack
Out (reg)
1
Acknowledge that RX byte was read.
OLED_Write
Out (reg)
1
Pixel update signal for OLED.
OLED_Col
Out (reg)
7
OLED column index.
OLED_Row
Out (reg)
6
OLED row index.
OLED_Data
Out (reg)
24
OLED pixel data <R,G,B> (8/8/8 aligned).
ACCEL_Data
In
32
Packed <Temp, X, Y, Z> from accelerometer.
ACCEL_DReady
In
1
Accelerometer data ready flag.
RESET
In
1
Active-high reset.
CLK
In
1
Clock input (divided clock shown on LED[8]).
Memories
In Wrapper.v (our motherboard), we not only have our CPU RV.v, but also three memory spaces:
IROM: instruction memory (program comes from
.memfile dumped by RARS).DMEM: data memory (for global variables, stack, etc.).
MMIO: memory-mapped I/O (LEDs, switches, UART, OLED, etc.).
Address Decoding
In our Wrapper.v, we need to decide whether we want to access (can be "read" or "write") DMEM or MMIO peripherals. To do so, we introduce lots of dec_* signals to indicate whether we want to access DMEM or MMIO peripherals.
Connection to RV
Different from the Lec 03 microarchitecture, here we added the MMIO part inside.

Data access path
The data access path has two parts, one is to load data from either DMEM or MMIO, the other is to store data to either DMEM or MMIO.
Load: Loading from both DMEM and MMIO peripherals look identical to CPU (in our implementation, we read both together first), only Wrapper decides the source.
CPU computes address in ALU ->
ALUResult.CPU asserts
MemRead = 1. (MemReadis justMemtoRegin our microarchitecture)Wrapper checks
ALUResultand enable correspondingdec_*signal to indicate which memory we want to access (read from here)If address ∈ DMEM range (
dec_DMEM == 1)-> access (read from) DMEM, and store the read data intoReadData_DMEM.If address ∈ MMIO range (
dec_MMIO_* == 1) -> access (read from) the MMIO peripheral and store the read data intoReadData_MMIO.After that, we delay the decoded signals
Wrapper puts that value into
ReadData_in. (This is where the I/O Multiplexing comes into play)CPU reads
ReadData_inand writes into register file.
The data to be read (either from MMIO or DMEM) is always stored at ReadData_in first (as always @(*) is used). But whether it will be passed to the register rd (via the WD port) in the processor depends on the MemRead or MemtoReg signal.
Store: It will either update memory or trigger side effects on peripherals.
CPU computes address in ALU ->
ALUResult.CPU outputs the value we want to store in
WriteData_out.CPU asserts
MemWrite_out = 1. (This is the{4{MemWrite}}, whereMemWriteis in our Lec 03 microarchitecture)Wrapper checks
ALUResult:If address ∈ DMEM range -> write
WriteData_outinto DMEM.If address ∈ MMIO range -> forward
WriteData_outinto peripheral (e.g. set LEDs, update 7-seg).
CPU does not expect anything on
ReadData_in.
We use synchronous memory write here, which means we store WriteData_out at the end of the 1 clock cycle we use to execute the store instruction. It happens “at the end” of the cycle — not in a separate cycle.
Why use synchronus? This is because now memory updates exactly at the clock edge. So, by the time the clock rises:
ALUResultis stableWriteData_outis stableMemWrite_outis stable
So, the memory can just latch the correct value reliably, which is to write on the next posedge CLK.
RISC-V processor
Our RISC-V processor, or simply denoted as RV processor, is designed in RV.v, which also has the following hierarchy
Tips
Demo
As you may be confused about what you are going to demo, here are some tips from the teaching team
General Caveats
Make your output dependant on some user inputs. A.k.a, don't hardcode the output using a counter, etc.
Design your assembly program such that if one instruction doesn't work, the final output will be completely different. In other words, to get this output correctly given this input, every instruction in your assembly program should work correctly.
Don't just look at the final output to state that every instruction of your prgram works. For example, if you shift right by 5 bits and shift left by 5 bits, even if your shift instructions don't work, your result will still be right.
Prepare about the RISC-V Microarchitecture
Especially know what is the datapath for every type of instruction we have learned. Better to understand and then memorise it! Below is a file summarised all the datapaths for each instruction:
General Steps
This is the most important tip for the demo!
Assembly code correctness: Step through the assembly program step by step to make sure the algorithm works correctly on the "high-level".
HDL (Vivado) simulation: Step through the each instruction to make sure each instruction is executed correctly on vivado simulation (behavioral + post-synthesis)
Hardware demo: This step cannot step through and thus we just provide the input and see the output.
Last updated