Lec 03 - RISC-V ISA and Microarchitecture
As ISA is just architecture (see Harris & Harris!), so this lecture will talk about the architecture and microarchitecture part of RISC-V!
RISC-V ISA
As we have already seen in Harris & Harris's introduction to architecture, we first recap some important points regarding architecture and microarchitecture.
Architecture
This is the programmer's view of computer. And it has the following features:
Defined by instructions & operand locations
Assembly language: human-readable format of instructions
Machine language: computer-readable format (1’s and 0’s)
Assembly language -> Machine language conversion is done by the assembler
one to one correspondence (except for pseudo-instructions)
Microarchitecture
This is about how to implement an architecture in hardware (we will see later in second half of this lec series)
RISC-V Features
In the lec, we have introduced some interesting features about RISC-V, and they are as follows,
As a RISC architecture, the RISC-V ISA is a load-store architecture, which means only load/store variants can access memory
No mixing of memory access with data processing or branching
Memory and register is not the same! Here, memory usually refers to the main memory.
Interesting design choices to simplify hardware implementation
Especially the encoding of immediates (We will see later).
Instruction length and word length are not necessarily the same!
Below are some very interesting points starting from the instruction length vs. word length.
Instruction Length vs. Word Length
The word length of a processor is the width of its registers/ALU or the size of most elements in its datapath. For example, the following are three types of Risc-V processors,
RV32 → word length = 32 bits (registers and ALU are 32-bit wide).
RV64 → word length = 64 bits.
RV128 (rare, theoretical) → word length = 128 bits.
The instruction length is the number of bits to encode an instruction and it is determined by the ISA encoding design and extensions included. For example,
Base ISA (RV32I, RV64I, RV128I) → instructions are always 32 bits long.
If you add the Compressed (C) extension, some instructions are 16 bits long.
There are also 48-bit and 64-bit instruction encodings in the RISC-V spec (for special extensions).
In a 32-bit (word length) system, the instruction length doesn't have to be 32-bit.
What if word length and instruction length doesn't match?
Instruction length ≠ word length
A 32-bit processor does not mean all instructions must be 32 bits long. This is what we have seen from above.
Example: ARM Thumb and RISC-V compressed (
C) extension use 16-bit instructions on a 32-bit processor.The processor’s frontend (fetch/decode) handles variable instruction sizes, while the backend (execution units, registers, ALU) still works on 32-bit data.
The idea of frontend and backend of a processor is very important and we will come back to it in the microarchitecture again later also!
Microarchitecture impact
Supporting multiple instruction lengths adds complexity to the frontend (fetch, decode).
The backend (datapath, ALU, register file) usually remains tied to the word length, so the impact on overall hardware design is noticeable but not drastic.
Practical advantage: saving instruction memory
Shorter instructions (16-bit vs. 32-bit) reduce code size.
This allows more instructions to fit into the same memory/cache, improving memory efficiency and potentially performance.
Word length and memory addressing
In a 32-bit processor, addresses are 32 bits wide.
This limits the maximum directly addressable memory to
This is a backend (data memory) property, tied to word length, not instruction length.
Thus, even if instructions are compressed to 16 bits, the system is still limited to 4 GB RAM because addressing depends on word length.
More about modern processors
ISA Stability
The Instruction Set Architecture (ISA) (e.g., x86, ARM, RISC-V) usually remains stable across generations.
This ensures that code compiled decades ago can still run on modern processors of the same ISA family.
Example: Modern Intel CPUs can still run DOS-era x86 code.
Extensions to the ISA
Instead of replacing the ISA, new generations often extend it with additional instructions (e.g., Intel’s SSE, AVX, AVX-512; ARM’s NEON; RISC-V vector extensions).
These new instructions allow compilers and developers to take advantage of improved performance features, but are optional from the program’s perspective.
Backward and Forward Compatibility
Backward compatibility: Old code (compiled for older processors) runs fine on newer processors, since the original instructions are still supported.
Forward compatibility: New code (compiled with instructions from the newest extensions) will not run on older processors if those instructions are missing.
Workaround: Compilers often provide a compatibility mode (e.g., “target x86-64 baseline”) so the same program can run on older hardware.
Real-world example: Windows 11
Windows 11 requires support for certain instruction set extensions (e.g., SSE4.2, CMPXCHG16b, LAHF/SAHF, and in practice often AVX2).
Older CPUs without these instructions cannot run Windows 11, even though they are technically “x86-64 processors.”
Microarchitecture vs. ISA
New processors primarily innovate by updating the microarchitecture (pipeline depth, branch prediction, cache design, out-of-order execution, etc.) while keeping the ISA stable.
This allows performance to improve without breaking software compatibility.
Registers
The register set has been introduced in Harris & Harris. But the following image adds the information about whether the register should be saved by the caller or callee.

And below are some useful notes
PC in RISC-V
In RISC-V, PC is not a register readable/writable explicitly by any instruction, e.g., it is not a visible register. And,
In RISC-V, PC just stores the address of the current instruction. (Not like ARM, PC actually stores the address of the current instruction address +4 or +8)
Writing PC is done only by branch/jump instructions.
No instruction updates more than one visible register
This is a very important golden rule (The step title here). And, in RISC-V, the register updated is explicitly specified in the rd field. This ensures that the register file only needs one write port.
No flag registers
In RISC-V, flags are never stored for “future use.” Instead, comparisons and branches are self-contained. For example, beq x1, x2, target; will directly compare x1 and x2, and branches to target.
If you do need the comparison result as data, use an instruction like
sltto store it in an general-purpose registerslt x3, x1, x2;meansx3 = 1ifx1<x2, elsex3=0.Because branch instructions already use the ALU for comparison, RISC-V usually has a separate unit to compute branch target addresses (
PC + offset).
Instruction Formats
RISC-V instruction has 6 formats, and they are well summarized in the following table,

Opcode field occupies the least significant part of the instruction
This gives us the benefit that,
CPU can recognize instruction type by just reading the first byte (since RISC-V uses little-endian and little-endian puts opcode at the lowest address).
"3L mnemonic" for little-endian: In little-endian formatting, least significant byte goes to lowest memory address.
All immediates are MSB-extended
So all immediates become sign-extended words. RISC-V immediates are encoded in 2’s complement form. So when you sign-extend, you’re effectively just preserving the correct integer value. For example, in RV32I, 12-bit immediate,
Immediate field =
1111_1000(0xF8).As 12-bit 2’s complement → −8.
Sign-extend to 32 bits →
0xFFFFFFF8.ALU sees this and just adds normally, giving a result 8 less.
Only all immediates are MSB-extended. Later we will see, the value stored in registers sometimes are zero-extended.
However, this may become a bit wired with instruction sltu. But, let's look at the following example,
Suppose the above is our "set if less than immediate unsigned" instruction, and RISC-V processor will treate -8 as 0xFFFFFFF8.
x1=-2,x1will be interpreted as0xFFFFFFFE. As ALU just compares unsigned,0xFFFFFFFEis greater than0xFFFFFFF8, thusfalse, meaning under unsigned comparisonx1is not smaller than-8!x1=5,x1will be interpreted as0x00000101. Similarly, ALU just compares unsigned,0x00000101is smaller than0xFFFFFFF8, thustrue, meaning under unsigned comparisonx1is smaller than-8(unsigned)
This will give us the range for the immediate in slti and sltiu: 0x800 - 0x7FF (-2048 to +2047). This range applies to all I-type and S-type instructions. The B-type instructionis a bit special, its range is (-4096 to +4094 <-> 0x1000 - 0xFFE). But as in B-type instructions, it is impossible to give the immediate manually, I think this won't appear in quizzes or finals 😂.
So, instructions like sltiu x3, x1, 0x00000FFF is illegal as 0xFFF = 4095 exceeds to range for the immediate in I-type and S-type instructions!
Treat negative number in 2's complement as unsigned meaning it will become a very large positive number.
DP Instruction
"DP" stands for data-processing. The following tables summarises all the base DP instructions.

From this table, we notice that
subiis unnecessary as the assembler can encodeA-BasA+(-B). This find asBis immediate, ifBis a register, thenBcannot be known at assembly time, and that's whysubis still needed.And the following table about the three types of shift operations is copied from Harris & Harris, just for CG3207 midterm quiz purpose.

DP Pseudo-Instruction
All the pseudo-instruction in RISC-V is introduced here! Among them, knowing the working principle of auipc from Lab 1 is also necessary!
Multiply and Divide
The multiply and divide are not part of the base instruction set, but are available as an optional standard extension.

Memory Instruction
The following is the base memory instruction:

So, from this table, notice that
For the memory instruction syntax, if
imm=0, we can omit theimm. e.g.,op rd, (rs1).For
lbandlh, which loads a byte or a half word, the rest of bits are formed by sign-extension/MSB-extension (just copy the MSB) of the byte/half-wordFor
lbuandlhu, zero extension (copy zero only) is done.
Control Instructions
The following is the base control instructions:

The immediate in branch instructions
In the branch instructions, the immediate is 12 bits, but it stores the [12:1] instead of [11:0]. This is because in RISC-V, every instruction must start at an address that is an even number (a multiple of 2). You can never jump to an odd address (like 0x1001) because instructions never exist there.
So, instead of storing bits [11:0] (which would include the useless last zero), the instruction stores bits [12:1]. The hardware assumes the last bit (bit 0) is always 0.
Two types of jumps
Look at the "Description (C)" of
jalandjalrfrom the table above to understand this part better!
jal: jump and link, is a J-type instruction. And it stores return address inrd.Used when you know the target address at assembly time.
Used in function call (Jump to a function’s code so it can execute)
immis 20-bit.
jalr: jump and link register, is an I-type instruction. And it stores return address inrd(usuallyra)Use when the target address is dynamic, stored in a register.
Used in function return (Go back to where the function was called from). This is because the function can be called from many different places. The return address isn't known until the program is actually running. Therefore, we jump to the address stored in the register
ra.immis 12-bit, but it can jump anywhere in a 32-bit absolute address range. Aluiinstruction can first loadrs1with the upper 20 bits of a target address, thenjalrcan add in the lower bits.Similarly,
auipcthenjalrcan jump anywhere in a 32-bit pc-relative address range.
For example, in the following code, we can see how jalr can jump to anywhere with the help of lui.
Put it all together
Nothing is better than an example! So, let's loook at an example to put everything together.
RISC-V Microarchitecture
From a computer hardware engineer's view, a computer can be divided into 2 parts
Datapath: Backend
Control Unit: Frontend
Datapath
Datapath is the path through which data "flows". It includes the following elements
Storage elements
Like memories and registers. The storage elements can be further divided into the following parts
Architectural state elements: manipulated by the programmer, like instruction memory (IROM), data memory (DMEM), register file (x0-x31), PC, and other control registers.
Microarchitectural state elements: not accessible to the programmers, like pipeline registers, cache tags, and branch predictor state.
Control Unit
Control Unit controls the flow/processing/storage of data in the datapath via
Mux selects: choose which input goes into a multiplexer (e.g., should ALU input B come from register or immediate?).
Register write enables: should this register latch a new value at the clock edge or stay the same?
Functional unit activation: is the ALU active? Is the memory doing a read or a write?
Operation selection: what exact operation should ALU perform (add, sub, AND, OR…)?
The Control Unit in the CISC processor uses state machine while the one in the strictly single cycle processor uses combinational logic.
Implement a single-cycle microarchitecture
In this lecture, we will implement a single-cycle microarchitecture first. So basically, a single-cycle microarchitecture will fetch, decode, execute all in one clock cycle. And in this lecture, we have covered four single-cycle microarchitecture, each is built upon the previous one,
Single-Cycle Processor with Control

As in a single-cycled processor, the CU is just a decoder, the following table summarises the decoder behavior,

As you can save from the schematic above, the ALU will also output a 3-bit ALUFlags to the PC Logic, inside the Control Unit, and the PC logic is summarised as follows,

Inside the PC logic, we should be clear that it has two inputs (PCS, ALUFlags[2:0]) and one output (PCSrc)
PCS: determines the instruction category for PC updates.ALUFlags[2:0]: Bits that describe the outcome of the ALU operation. The three bits are{eq, lt, ltu}.PCSrc: 1-bit control signal to choose the next PC Value.0: use the default PC+4 (fallthrough to next instruction).1: use the branch/jump target (computed by ALU + immediate).
For now, our PC only supports the beq, it will support more in the following iteration.
Support for lui and auipc
lui and auipc
Here, we just add a 2-bit output named ALUSrcA from our decoder, these 2 bits will control whether the ALU SrcA will have the current PC as input. And regarding this, we have CU decoder table updated as follows,

It is clear that we can see from DP Reg to branch instruction, the ALUSrcA is just x0, which means the first bit is Don't Care and the second bit is 0.
Support for link and jalr
jalr
Here, we have two modifications:
change the
ALUSrcBfrom 1-bit to 2-bit, thus adding another possibility for the immediate 4 to be the input of ALU SrcB.change the
PCSrcfrom 1-bit to 2-bit, thus adding another possibility for the PC value to be read from Register FileRD1.
Thus, we have two changes for our tables, the first one is the CU decoder table

Here, we can clearly see that when our instruction is jal, the ALUSrcA=11 makes sure the ALU SrcA is the current PC value. And ALUSrcB=01 makes sure that the ALU SrcB is immediate 4. Then the ALU will add these two and store the result into rd.
And our PC Logic table will also change,

Again, we can see it clearly that when the instruction is jalr, PCSrc[1]=1, and this will load the value in rs1 into the PC adder, and PCSrc[0]=1 will load the immediate value into the PC adder. And now the new PC value will be the sum of these two values! (As jalr rd, imm(rs1) -> rd=PC+4, PC=rs1+imm.)
This is the whole content to build a simple single-cycle RISC-V processor. And we may notice the following,
No datapath resource can be used more than once per instruction, so some must be duplicate
Separate memories: because fetch and load/store both need memory at the same time.
Two adders: because PC+4 and normal ALU addition both need addition in the same cycle.
So, how can we make it faster? The answer is wait for Lec 05 😉!
Last updated


