Lab 01 - Get prepared
Introduction
It is imperative to put in effort and try your best for this assignment. It may take an amount of effort that is quite disproportionately large, compared to the impact on your grade. This is normal. This assignment is designed to prepare you for the later ones, so that you can spend time debugging your design, instead of debugging your knowledge.
— From CG3207 Teaching Team
Task 1: Assembly Simulation
Task Instruction
The goal of this task is to understand the RISC-V Assembly Language. Thus, I will put some effort to explain this program public on the GitHub.
Overall Structure
This sample RISC-V assembly program contains 3 parts,
.eqv (Constants)
The code here is not data at all — it never goes into instruction memory (IROM) or data memory (DMEM). Instead, it's purely an assembler directive: a symbolic substitution (like #define in C).
.eqv NAME VALUEWhen the assembler sees NAME, it replaces it with VALUE.
Instruction Memory (IROM / Code Memory)
This is the program memory — where instructions live. It stores the instructions that the CPU fetches and executes. It has the following coding convention,
.text ## IROM segment
main:
li s0, MMIO_BASE
...
halt:
j haltCode Explanation
All code goes inside
.text.Labels like
main:,loop:,wait:are symbolic addresses.Final
halt: j haltensures execution has a “dead end” — since without an OS, programs don’t really return.With IROM depth 9 → 2^9 = 512 bytes → you can fit 128 instructions (each 4 bytes).
Pseudoinstructions (like
li) may expand to multiple real instructions, so you must stay within this limit.
Data Memory (DMEM)
Data memory stores constants, variables, strings, arrays, stack, heap, etc. Its coding convention is as follows,
.data ## DMEM segment
DMEM:
delay_val: .word 4 # a constant at DMEM+0x00
string1: .asciz "\r\nWelcome to CG3207..\r\n"
var1: .word 1 # static variable, initial value = 1
.align 9
STACK_INIT:Line-by-line Explanation
Line 15-31
These lines define some constants.
Line 40-45 (main)
This section initializes base addresses (MMIO_BASE, LED address, DIP switch address).
Line 41:
liis implemented aslui + addibecauseMMIO_BASEdoesn't fit in 12-bit immediate.Now
s0 = 0xFFFF0000. This will be the starting point for accessing all peripherals.
Line 43: Computes the LED peripheral’s memory-mapped address,
s1 = s0 + 0x60->s1 = 0xFFFF0060.Later, writing to
(s1)will control the LEDs.
Line 44: Since
DIP_OFF(0x64) fits in 12 bits, this is just oneaddiinstruction, notlui+addi. It loads the immediate into adpreg (data-processing register)s2, which is meant for the LED Offset from the MMIO base.Line 45: Add the LED Offset to the MMIO base.
The comment here means this is just a way to demonstrate the instruction
add(register + register) and the usage ofdpreg (data-processing register).
li in RISC-V means "load immediate" and its implementation depends on the size of the immediate value.
Line 46-54 (loop + wait)
This code snippet mainly does the following
Each time through
loop, DIP switches are read → LEDs updated.Then the program spends time in
wait, decrementings3until it reaches zero.That’s a software delay loop.
Once the delay is over, execution returns to
loop, reloadsdelay_val, and the process repeats.
So, the overall effect is: LEDs continuously reflect DIP switches, but with a controlled refresh rate (slowed down by the delay loop). And the detailed explanation is as follows:
Line 47: Loads the constant
delay_val(here = 4) from data memory intos3.Line 48: Reads the DIP switch values from the MMIO register at
s2 = 0xFFFF0064.Line 49: Writes the same value into the LED MMIO register (
s1 = 0xFFFF0060). So the LEDs mirror whatever is on the DIP switches.Line 51: Decrement the delay counter.
Line 52: If counter hits zero, go back to
loop:to reloaddelay_valand refresh LEDs.Line 53: If counter ≠ 0, jump back to
wait:(continue counting down).Here
jal zero, waitis used as a plain jump. Sincejalnormally stores the return address into a register, writing intozerodiscards it. It is equivalent toj wait.
Line 63-79 (dmem)
This is the data memory,
Line 69: Defines a constant
delay_valstored at the beginning of data memory.Line 70-71: Stores a null-terminated string in memory. Each character = 1 byte. The assembler appends a null (
0x00) at the end.Line 72: A statically allocated variable, initialized with
1. As the string above is 24 byte, it is stored fromDMEM+0x4toDMEM+0x18. Thus, thevar1happens to fit inDMEM+0x1c.Line 73: If the string is 1 byte longer, then
var1will be stored atDMEM+0x20for word-alignment.
Line 75:
.align 9means “advance the current memory location to the next multiple of 2⁹ = 512 bytes.”Line 76: Followed by Line 75, so the stack starts at address
DMEM+0x200.Line 77: Mainly describes the stack in RISC-V
Stack grows downwards (toward lower addresses).
sp(stack pointer) should be initialized to this address.Each push → decrement
sp, each pop → incrementsp.
In RISC-V, the word is stored in low-endianness. So, below is how the String in Line 72 is stored,

The word is stored using little endianess (we've encounterd little-endiance in CG2111A) in RISC-V memory, but within each byte, the byte is stored normally.
Demonstration
In this task, we mainly just need to demonstrate as the following images shows,

Run the code step by step till Line 48
Change the input at the DIP switches (
0xffff0064), then run Line 48 and 49, the output at LEDs (0xffff0060) should be mirrored.This loop is infinite, so showing this mirror once suffices.
Wait for the problems proposed by the TA.
Questions Preparation
What is the 32-bit representation of certain instruciont, like the
opcode,funct3, etcbring the risc-v card along with you
What is the memory capcity of IROM?
As we IROM can store 128 words, its memory capcity is 7 bits. (Although I think it is a bit not good here as memory capcity should be 128 words bruh, and the address of IROM is 7 bits.)
Optional Task
Helloworld without subroutines
The RISC-V assembly code about HelloWorld is public on the GitHub. The overall behaivor is
It waits for the user to press the
Akey followed by Enter (\ror\n) on the console.It echoes every input character to the console, LEDs, and seven-segment display while waiting.
Once the correct input is received, it prints “Welcome to CG3207..” to the console using UART character by character.
The LED and seven-segment display here are just used as “hardware echo” that mirrors what you typed.
We met UART again, feel free to go back to NUS CG2111A Notes on reviewing how UART works! Here this UART serial communication is setup between our RISC-V processor and our PC's console (on RARS).
And I will do the explanation section by section,
Line 17-50 (Setup)
This is the setup work. Nothing special.
Line 52-78 (Read A and Enter)
This section is also pretty straight-forward. But in Line 75-78, the trick to implement if A or B needs our attention,
Line 79-96 (Print "Helloworld")
a0stores the address of of the word (4 bytes) to be printed. And within each word, one byte is printed a time. After a word has been printed,a0is incremented by 4 to print the next word. (As we've seen in the previous task, thestring1is 24 bytes — 6 words long)
Task 2: Basic HDL Simulation
In the
initialstatement, no matter in RTL code or testbench, the L.H.S signal must bereg.
RTL Design
Clock Enable
The Clock_Enable block has three states,
1Hz mode: Pull up the
enableto HIGH for 1 clock cycle (10ns for Nexys 4) every 1 second.4Hz mode: Pull up the
enableto HIGH for 1 clock cycle every 0.25 second.Pause mode: Pull down the
enableto LOW until pause mode is exited.
In our simulation, the enable behaves like below

The time enable is low is controlled by the corresponding threshold for each of the three modes above. We can think of enable as a slower clock signal.
Get Mem
This block is straight as we only need to implement two things
the combinational logic part for the
datafed to the seven-segment display and leds; and theupper_lowersignal fed to the ledsthe sequential logic part for the
counterwhich is basically theaddr.
Top
This module has nothing special, we just instantiate the two modules we have designed
the
Clock_Enablemodulethe
Get_Memmodule
And the one module that is provided,
the
Seven_segmodule
In the Top module, we also need to implement a multiplxer to choose the 16 bit data to be shown on the led.
Behavorial Simulation
Here, we basically need to simulate all the combinations of the inputs, which are
Both
btnCandbtnUare not pressed.btnUis pressed butbtnCis not pressed.btnCis pressed butbtnUis not pressed.
No auto-check version
From the explanation above, we know that we need 512 rounds to display all the data from IROM an DMEM. Under each of the three modes above, the clock cycles for each round is different, and it is determined by the corresponding threshold value in each mode, thus
Under 1Hz mode, we need clock cycles in total, this will be the terminating
ivalue in our for loop.Similarly, under 4Hz mode, we need clock cycles.
And the total timing is also not difficult to calcualte. Let's say we delay 10ns at each round. So, under 1 Hz mode, the total time taken will be .
Notes
The
initialstatement will not be ignored in the behavorial simulation, but will be ignored in the real run, or synthesis.Remember to change the
thresholdvalues back to normal after simulation. (For observing the results quickly, we have changed them to some really small number during the simulation)
Auto-check version
Here, the auto-check means we should read the IROM and DMEM data again in our testbench. And then check at each clock cycle, the led output of our UUT is the same as the expected led. (Only led here because led is the only port visible to us). See more from this issue.
To implement this, we are recommended to write a verilog task to do the checking for us. (Reduce duplicate code)
Code Explanation
The argument
cyclesdoesn't need to take the threhold into accound because it uses the posedge ofuut.enable.
Then in our main simulation, we can just call these two methods under each phase.
Code Explanation
The
#10at Line 16 and Line 23 is important as it makes sure the during the next phase the simulation actually sees the button change. The value may change depending on the the clock period. Usually, we should pause for 1 clock period.
Demostration
Load the bitstream file into your FPGA.
Press the
btnU,btnCto see the output of your FPGA.
Questions Asked
What is the difference between
Clock_Enablebetween the normal 100MHz clock?Clock_Enableis a slower clock implemented by thethresholdthinking.
How to verify the content shown on the seven-seg display is correct?
The only way we can do is to compare it visually with the memory files (IROM and DMEM)
The fruits of our labour
The video is out-dated and is for reference only as the lower-half and upper-half displaying sequence is flipped. As our design specification, should display upper-half first, then lower-half.
Some Interesting Questions
Why the actual behavior of seven-seg display and leds is different from the behavorial simulation?
An interesting phenomenon I find out is that after the instruction memory, the last line will blink several times and then it still prints the instructions memory.
This is solved by Dr. Rajesh and TA Neil in this disscussion. The main reasons is that the synthesis tool will likely optimise the storage to a combinational circuit (4-input, 32-output) instead of a ROM as the utilisation is very low. That means repeating pattern modulo 16 composed of 13 valid and 3 garbage words.
Why after I press btnU, my FPGA will have a delay, then change to fast speed?
This happens because of the way the Clock_Enable logic uses the counter vs. threshold comparison.
If your Line 7 is counter == threshold - 1, then you will encounter the problem stated. But why?
In 1 Hz mode, the
thresholdis large, so thecountercan hold a relatively big value.When you switch to 4 Hz mode, the new
thresholdis much smallerIf the current counter value is still larger than the
threshold_4Hz(new threshold) but smaller than thethreshold_1Hz, the conditioncounter==threshold-1will be false for many cycles. Thus, the counter continues incrementing until it eventually wraps around to zero, only then generating the first enable pulse at the new faster rate.This “wrap-around” is the delay you see when switching speeds.
Last updated
