Assembly Language

Assembly language is the human-readable representation of the computer's native language. Each assembly language instruction specifies both the operation to perform and the operands on which to operate. We introduce simple arithmetic instructions and show how these operations are written in assembly language. We then define the RISC-V instruction operands:

  1. registers

  2. memory

  3. constants

Instructions

One of the most common operations computers perform is addition. Code Example 6.1 shows code for adding variables b and c and writing the result to a.

Example 6.1 Addition
a = b + c;

Code Explanation

Code Example 6.2 shows that subtraction is similar to addition. The sub instruction format is the same as the add instruction: destination operand, followed by two sources.

Example 6.2 Subtraction
a = b - c;

This consistent instruction format is an example of the first design principle:

Regularity supports simplicity.

Instructions with a consistent number of operands — in this case, two sources and one destination — are easier to encode and handle in hardware. More complex high-level code translates into multiple RISC-V instructions, as shown in Code Example 6.3

Example 6.3 More Complex Code
a = b + c - d;

Code Explanation

Using multiple assembly language instructions to perform more complex operations is an example of the second design principle of computer architecture:

Make the common case fast.

The RISC-V instruction set makes the common case fast by including only simple, commonly used instuctions. The number of instructions is kept small so that the hardware required to decode the instruction and its operands can be simple, small and fast. More elaborate operations that are less common are performed using sequences of multiple simple instructions. Thus RISC-V is a reduced instruction set computer (RISC) architecture. Architectures with many complex instructions, such as Intel's x86 architecture, are complex instruction set computers (CISC). For example, x86 defines a "string move" instruction that copies a string (a series of chars) from one part of memory to another. Such an operation requires many, possibly even hundreds, of simple instructions in a RISC machine. However, the cost of implementing complex instructions in a CISC architecture is added hardware and overhead that slows down the simple instructions.

A RISC architecture, such as RISC-V, minimizes the hardware complexity and the necessary instrcution encoding by keeping the set of distinct instructions small. For example, an instruction set with 64 simple instructions would need log264=6\log_264=6 bits to encode the operation, whereas an instruction set with 256 instructions would need log2256=8\log_2256=8 bits of encoding per instruction. In a CISC machine, even though the complex instructions may be used only rarely, they add overhead to all instructions, even the simple ones.

Operands: Registers, Memory and Constants

An instruction operates on operands. In Code Example 6.2 above, the variables a, b and c are all operands. But computers operate on 1's and 0's, not variable names. The instructions need a physical location from which to retrieve the binary data.

Operands stored as constants or in registers are accessed quickly, but they hold only a small amount of data. Additional data must be accessed from memory, which is large but slow.

Registers

Instructions need to access operands quickly so that they can run fast, but operands stored in memory take a long time to retrieve. Therefore, most architectures specify a small number of registers that hold commonly used operands. The RISC-V architecture has 32 registers, called the register set, stored in a small multiported memory called a register file. The fewer the registers, the faster they can be accessed. This leads to the third design principle:

Smaller is faster.

Code Example 6.4 shows the add instruction with register operands. The variables a, b, and c are arbitrarily placed in s0, s1, and s2. The name s1 is pronounced "register s1" or simply "s1". The instruction adds the 32-bit values contained in s1 (b) and s2 (c) and writes the 32-bit result to s0 (a).

Code Example 6.5 shows RISC-V assembly code using a register, t0, to store the intermediate calculation of b+c.

The Register Set

Table 6.1 lists the name and use for each of the 32 RISC-V registers. Registers are numbered 0 to 31 and are given a spcial name to indicate a regsiter's conventional purpose.

Table Explanation

Constants/Immediates

In addition to register operations, RISC-V instructions can use constant or immediate operands. These constants are called immediates because their values are immediately available from the instruction and do not require a register or memory access. Code Example 6.6 shows the add immediate instruction, addi, that adds an immediate to a register.

In assembly code, the immediate can be written in decimal, hexadecimal, or binary. Hexadecimal constants in RISC-V assembly language start with 0x and binary constants start with 0b, as they do in C.

Immediates are 12-bit two's complement numbers, so they are sign-extended to 32 bits.

The addi instruction is a useful way to initialize register values with small constants. Code Example 6.7 initializes the variables i, x, and y to 0, 2032, -78, respectively.

To create larger constants, use a load upper immediate instruction (lui) followed by an add immediate instruction (addi), as shown in Code Example 6.8. The lui instruction loads a 20-bit immediate into the most significant 20 bits of the instruction and places zeros in the least significant bits.

Memory

In RISC-V architecture, instructions operate exclusively on registers, so data stored in memory must be moved to a register before it can be processed. By using a combination of memory and registers, a program can access a large amount of data fairly quickly.

The RVI RISC-V architecture uses 32-bit memory addresses and 32-bit data words.

RISC-V uses a byte-addressable memory. That is, each byte in memory has a unique address, as shown in Figure 6.1 (a).

A 32-bit word consists of four 8-bit bytes, so each word address is a multiple of 4. The MSB is on the left and the LSB is on the right. The order of bytes within a word is discussed further later. Both the 32-bit word address and the data value in Figure 6.1 (b) are given in hexadecimal. For example, data word 0xF2F1AC07 is stored at memory address 4. By convention, memory is drawn with low memory addresses toward the bottom and high memory addresses toward the top.

The load word instruction, lw, reads a data word from memory into a register. Code Example 6.10 loads memory word 2, located at address 8, into a (s7). The lw instruction specifies the memory address using an offset added to a base register.

Code Explanation

The store word instruction, sw, writes a data word from a register into memory. Code Example 6.11 writes the value 42 from register t3 into memory word 5, located at address 20.

Last updated