Arithmetic Circuits

This part is almost the same NUS CG3207 Lec 04 - Arithmetic for Computers. So, I will combine the information together here.

Arithmetic circuits are the central building blocks of computers. Computers and digital logic perform many arithmetic functions: addition, subtraction, comparisons, shifts, multiplication, and division. This section describes hardware implementations for all of these operations.

Addition

Addition is one of the most common operations in digital systems. If you still remember, we've introduced here that all binary arithmetics can be done using adders! We first consider how to add two 1-bit binary numbers. We then extend to N-bit binary numbers. Adders also illustrate trade-offs between speed and complexity.

Half Adder

We begin by building a 1-bit half adder. As shown in Figrue 5.1, the half adder has two inputs, A and B, and two outputs, $S$ and $C_{\text{out}}$ . $S$ is the sum of A and B. If A and B is 1, $S$ is 2, which cannot be represented with a single binary digit. Instead, it is indicated with a carry out $C_{\text{out}}$ in the next column.

In a multi-bit adder, $C_{\text{out}}$ is added or carried in to the next most significant bit (or into the next column). However, the half adder lacks a $C_{\text{in}}$ input to accept $C_{\text{out}}$ of the previous column. The full adder, described in the next section, solves this problem.

Full Adder

A full adder accepts the carry in $C_{\text{in}}$ as shown in Figure 5.3. The figure also shows the output equations for $S$ and $C_{\text{out}}$ .

Carry Propagate Adder

An $N$ -bit adders sums two $N$ -bit inputs, A and B, and a carry in $C_{\text{in}}$ to produce an $N$ -bit result $S$ and a carry out $C_{\text{out}}$ . It is commonly called a carry propagate adder (CPA) because the carry out of one bit propagates into the next bit. The symbol for CPA is shown in Figure 5.4; it is drawn just like a full adder except that A, B, and S are buses rather than single bits. Three common CPA implementations are called

Ripple-Carry Adder

The simplest way to build an N-bit carry propagate adder is to chain together N full adders. The $C_{\text{out}}$ of one stage acts as the $C_{\text{in}}$ of the next stage, as shown in Figure 5.5 for 32-bit addition. This is called a ripple-carry adder.

The ripple-carry adder has the disadvantage of being slow when N is large. As $S_{31}$ depends on $C_{30}$ , which depends on $C_{29}$ , and so forth all the way back to $C_{\text{in}}$ . We say that the carry ripples through the carry chain. The delay of the adder, $t_{\text{ripple}}$ grows directly with the number of bits, as given in Equation 5.1, where $t_{\text{FA}}$ is the delay of a full adder,

t_{\text{ripple}}=Nt_{\text{FA}}

In Ripple-Carry Adder, the main disadvantage is that the carry propagates very slow. Can we accelerate the process of knowing the carry of an adder? Here comes the carry-lookahead adder.

Carry-Lookahead Adder

A carry-lookahead adder (CLA) is another type of carry propogate adder that solves the problem caused by ripple-carry adder by dividing the adder into blocks and providing circuitry to quickly determine the carry out of a block as soon as the carry in is known. Thus it is said to look ahead across the blocks rather than waiting to ripple through all the full adders inside a block.

CLAs use bit-level generate ( $g$ ) and propagate ( $p$ ) signals to determine carry behavior for individual bit positions, which are then combined into block-level signals ( $G$ and $P$ ) to determine the carry out for larger multi-bit groups.

The $i$ -th column of an adder is said to generate a carry if it produces a carry out independent of the carry in. Hence, we have $g_i=A_iB_i$ .
The $i$ -th column of an adder is said to propagate a carry it is produces a carry whenever there is a carry in. Thus, we have $p_i=A_i+B_i$ .

One great advantage here is that both $g_i$ and $p_i$ depend only on the two inputs number $A_i$ and $B_i$ , instead of the need of knowing the "carry-in" ( $C_{i-1}$ ) of that block.

Using the $g_i$ and $p_i$ , we can rewrite the carry out for each block of full adder ( $C_i$ )

\begin{align*} C_1&=p_0C_0+g_0 \\ C_2&=p_1C_1+g_1=p_1p_0C_0+p_1g_0+g_1\\ \cdots\\ C_i&=p_iC_{i-1}+g_i \end{align*}

Let's say we want to build a 32-bit carry-lookahead adder. Doing the above manipulation for 32 stages is cumbersome. However, we can divide them into four blocks first (a.k.a, implement a 4-bit carry-lookahead adder first as the fundamental block).

Using this fundamental block (4-bit carry-lookahead adder), we can build bigger adders. Figure 5.6 (a) shows a 32-bit carry-lookahead adder composed of eight 4-bit blocks. Each block contains a 4-bit ripple-carry adder and some lookahead logic to compute the carry out of the block given the carry in, as shown in Figure 5.6 (b).

C₁, C₂, C₃ and C₄ are the four carries that we've looked ahead!

Image Explanation

The AND and OR gates needed to compute the single-bit generate and propagate signals, $G_i$ and $P_i$ , from $A_i$ and $B_i$ are left out for brevity.
The block propagate ( $p_i$ ) and generate ( $g_i$ ) signals for 4-bit blocks ( $i$ refers to the block number here),
1. $P_0=p_3p_2p_1p_0$ , $P_0$ is the same as $P_{[3:0]}$ shown in the figure above
2. $G_0=g_3+p_3(g2+p2(g1+p1g_0))$ , $G_0$ is the same as $G_{[3:0]}$ shown in the figure above.
The ripple effect is avoided within a block, but the ripple carry effect will be present between blocks. Nevertheless, the delays are significantly lower than the orginal ripple carry adder.

Putting it all together

HDLs provide the + operation to specify a CPA. HDL Example 5.1 describes a CPA with carries in and out

Example 5.1 Adder

module adder #(parameter N = 8)
              (input  logic [N-1:0] a, b,
               input  logic         cin,
               output logic [N-1:0] s,
               output logic         cout);
  assign {cout, s} = a + b + cin;
endmodule

Example 5.1 Adder

module adder #(parameter N = 8)
              (input  [N-1:0] a, b,
               input          cin,
               output [N-1:0] s,
               output         cout);
  assign {cout, s} = a + b + cin;
endmodule

Subtraction

Recall from previous part that adders can add positive and negative numbers using two's complement number representation. Substraction is almost as easy as:

flip the sign of the second number
then add it

Flipping the sign of a two's complement number is done by inverting the bits then adding 1.

Thus, subtraction is performed with a single CPA by adding $A+\bar B$ with $C_{\text{in}}=1$ . Figure 5.9 shows the symbol for a subtractor and the underlying hardware for performing $Y=A-B$ .

An N-bit subtractor needs N NOT gates. This is tricky! Don't just see from the figure and then deduce that only one NOT gate is needed!

HDL Example 5.2 describes a subtractor.

Example 5.2 Subtractor

module subtractor #(parameter N = 8)
                   (input  logic [N-1:0] a, b,
                    output logic [N-1:0] y);
  assign y = a - b;
endmodule

Signed Addition/Subtraction

In the signed addition/subtraction, we have four flags, N(egative), Z(ero), C(array), V(Overflow), to indicate the status of our result. The following part will explain when each certain flag is set,

Before that, let's make some clarifications or conventions first.

We denote that the LSB is at bit 0 or 0-th bit, so on and so forth for all the remaining bits.
When we add two N-bit numbers, we will get a N+1 bit number. However, we treat the MSB (the N-th bit) as the carry bit, and the remaining N bits are the result. So, again when we say the MSB of the result, we are referring to the (N-1)-th bit actually.

Signed Addition

By observing the addition examples above, we may find out that

For unsigned addition, carry = 1 indicates that the result is wrong.
For signed addition,
1. Overflow occurs when (+ve) + (+ve) = (-ve) or (-ve) + (-ve) = (+ve), a.k.a, two negative number addition gives out a positive number or two positive number addition gives a negative number
  1. To determine the overflow flag, we always treat the two operands as signed! So, this process is just to see that if the sign bits (MSB) of the operands are the same, and is different from that of the result (the (N-1)-th bit), there is an overflow.
  2. Overflow can never occur when you add two numbers of different signs
  3. Carry and overflow are unrelated (However, carry is sometime called "unsigned overflow")
2. Carry flag is set when the value in the blue rectangle is one.
3. Negative flag is determined by only looking at the MSB of the result (the (N-1)-th bit actually).
4. Zero flag is determined by only looking at the N bits result.

Notes

In addition, there are some small tips
1. N flag and Z flag cannot be 1 simultaneously.
2. Z flag and V flag can be set simultaneously. e.g., 1000+1000=0000 with carry out 1, and the interpretation of the result to be 0.
The hardware as well as the instruction used for performing both signed and unsigned addition are the same
1. The hardware doesn't know and doesn't care if we are dealing with signed or unsigned.
2. Only the interpretation of the result and/or the way the condition codes are used are different.
3. So, first use signed/unsigned to write out the two operands, then just do the binary addition on these two operands and derive the flags.

Signed Subtraction

By observing the subtraction examples above, we can find out that

For unsigned subtraction, borrow indicates that the result is wrong.
1. In ARM, carry is just NOT borrow. However, in Intel, carry is borrow, but in subtraction, it will flip the carry bit to get the carry flag when it is subtraction.
For signed subtrcation,
1. Overflow occurs at when (+ve) - (-ve) = (-ve) or (-ve) - (+ve) = (+ve).
  1. And similar to the signed addition, we always treat the two operands as signed. So, if the sign bits (MSB) of the operands are different, and the result has a sign same as that of the second operand, there is an overflow.
  2. Overflow can never occur in subtraction when operands are of the same sign.
2. In hardware, the Carry flag simply reflects the ALU's carry-out signal. Conceptually, during subtraction, this signal acts as the inverse of the borrow used in pen-n-paper arithmetic.

Notes

The hardware as well as the instruction used for performing both signed and unsigned subtraction are the same
1. The hardware doesn't know and doesn't care if we are dealing with signed or unsigned.
2. Only the interpretation of the result and/or the way the condition codes are used are different.
3. So, first use signed/unsigned to write out the two operands, then just do the binary subtraction on these two operands and derive the flags.

In summary, we know the following points from the human interpretation side,

When C = 1, it indicates the unsigned addition result is wrong.
When C = 0, it indicates that the unsigned subtraction result is wrong.
When V = 1, it indicates that both signed addition and signed subtraction result are wrong.

For RISC-V, NZCV is irrelevant only when ALU does the addition. But, it is still recommended to go through how each flag of the NZCV is generated during addition.

Instead, in RISC-V, NZCV is relevant when ALU does the subtraction. Again, it is important to go through how NZCV is generated during subtraction. In RISC-V, the subtraction can be done in

sub
branch/slt variants

In pure sub instruction, the NZCV may not be that useful. But in branch/slt variants, this NZCV flag, which is not stored as bits for use by future instructions, becomes much more useful. The different comparisons, like eq, etc, are implemented using these NZCV flags.

Comparison

Signed/Unsigned

Condition

Implemented as

Uses

Both

A == B

Z flag

Both

A ≠ B

¬Z

Z flag

Signed

A < B

N ⊕ V

Signed overflow logic

Signed

A ≥ B

¬(N ⊕ V)

Signed overflow logic

ltu

Unsigned

A < B

¬C

Borrow logic

geu

Unsigned

A ≥ B

Borrow logic

Table Explanation

For the lt and ge, whch are for signed numbers, we must first make sure there is no overflow, then check the N flag.
For the ltu and geu, use the ARM borrow logic to understand. If A<B, then A-B will need a borrow, thus the carry flag is 0, and vice versa.
sltu is useful to check whether the subtraction of two numbers will need a borrow or not.

Multi-word arithmetic in RISC-V

Suppose we want to do 64-bit addition and subtraction in RISC-V. We use [t0:t1] to store the 64-bit number A and [t2:t3] to store the 64-bit number B. And [t4:t5] to store the final result.

For the addition, we have,

add   t4, t0, t2        # low  word sum
sltu  t6, t4, t0        # carry = (sum < t0)
add   t5, t1, t3        # high word sum
add   t5, t5, t6        # add carry

For the subtraction, we have,

sub   t4, t0, t2        # low  word diff
sltu  t6, t0, t2        # borrow = (t0 < t2)
sub   t5, t1, t3        # high word diff
sub   t5, t5, t6        # subtract the borrow

So, in summary, we have

Operation

Condition for carry/borrow

Equivalent test

RISC-V instruction

Addition

carry

(sum < A) or (sum < B)

sltu t6, t4, t0

Subtraction

borrow

(A < B)

sltu t6, t0, t2

Comparators

A comparator determines whether two binary numbers are equal or if one is greater or less than the other. A comparator receives two N-bit binary numbers A and B. There are two common types of comparators.

An equality comparator produces a single output indicating whether A is equal to B (A==B).
A magnitude comparator produces one or more outputs indicating the relative values of A and B.

Equality comparator

The eqaulity comparator is the simpler piece of hardware. Figure 5.11 shows the symbol and implementation of a 4-bit equality comparator. It first checks to determine whether the corresponding btis in each column of A and B are equal using XNOR gates. The numbers are equal if all of the columns are equal.

Magnitude comparator

Magnitude comparison is usually done by computing A-B and looking at the sign (most significant bit) of the resulf as shown in Figure 5.12. If the result is negative (e.g., the sign bit is 1), then A is less than B. Otherwise A is greater than or equal to B.

Here, the [N-1] means the output is the $(N-1)^{\text{th}}$ bit of the result N.

HDL Example 5.3 shows how to ues various comparison operations.

Example 5.3 Comparators

module comparator #(parameter N = 8)
                   (input  logic [N-1:0] a, b,
                    output logic        eq, neq, lt, lte, gt, gte);
  assign eq  = (a == b);
  assign neq = (a != b);
  assign lt  = (a < b);
  assign lte = (a <= b);
  assign gt  = (a > b);
  assign gte = (a >= b);
endmodule

Example 5.3 Comparators

module comparators #(parameter N = 8)
                    (input  [N-1:0] a, b,
                     output         eq, neq,
                     output         lt, lte,
                     output         gt, gte);
  assign eq  = (a == b);
  assign neq = (a != b);
  assign lt  = (a < b);
  assign lte = (a <= b);
  assign gt  = (a > b);
  assign gte = (a >= b);
endmodule

ALU

An Arithmetic/Logical Unit (ALU) combines a variety of mathematical and logical operations into a single unit. For example, a typical ALU might perform addition, subtraction, magnitude comparison, AND, and OR operations. The ALU forms the heart of most computer systems.

Figure 5.14 shows the symbol for an N-bit ALU with N-bit inputs and outputs.

The ALU receives a control signal F that specifies which function to perform. Table 5.1 lists typical functions that the ALU can perform.

Figure 5.15 shows an implementation of the ALU.

Image Explanation

$F_2$ is also the carry in to the adder.
In two's complement arithmetic, $\bar B+1=-B$ .
When $F_{\text{[2:0]}}=111$ , the ALU performs the set if less than (SLT) operation. When $A<B$ , $Y=1$ . Otherwise, $Y=0$ . In other words, Y is set to 1 if A is less than B.
1. SLT is performed by computing S=A-B. If S is negative (e.g., the sign bit is set), A is less than B. The zero extend unit produces an N-bit output by concatenating is 1-bit input with 0's in the most significant bits. The sign bit ( $(N-1)^{\text{th}}$ bit) of S is the input to the zero extend unit.
The AND and OR gate shown in the figure above both need 32 2-input AND/OR gate to implement! NOT One!
The multiplexer to select whether one of the input sources is the $N$ -bit $\bar B$ or $B$ can be implemented using $N$ 2-input XOR gates, where one input of the XOR gate is 1-bit from the $N$ -bit $B$ , the other is from the 1-bit signal $F_2$ .

Some ALUs produce extra outputs, called flags, that indicate information about the ALU output. For example, an overflow flag indicates that the result of the adder overflowed. A zero flag indicates that teh ALU output is 0.

HDL Implementation

The HDL for an $N$ -bit ALU is shown as follows

module alu32(input  logic [31:0] A, B,
             input  logic [2:0]  F,
             output logic [31:0] Y);
  logic [31:0] S, Bout;
  
  assign Bout = F[2] ? ~B : B;
  assign S = A + Bout + F[2];
  
  always_comb
    case (F[1:0])
      2'b00: Y <= A & Bout;
      2'b01: Y <= A | Bout;
      2'b10: Y <= S;
      2'b11: Y <= S[31];
    endcase
endmodule

Shifters and Rotators

Shifters and rotators move bits and multiply or divide by powers of 2. There are several kinds of commonly used shifters:

Logical shifter

It shifts the number to the left (LSL) or right (LSR) and fills empty spots with 0's. For example,

11001 LSR 2 = 00110; 11001 LSL 2 = 00100;

Arithmetic shifter

It is the same as logical shifter, but on right shifts fills the most significant bits with a copy of the old most significant bit (msb). This is useful for multiplying and dividing signed numbers. Arithmetic shift left (ASL) is the same as logical shift left (LSL) because we will fill the empty spots at right with 0, this doesn't matter much. For example,

11001 ASR 2 = 11110; 11001 ASL 2 = 00100;

Rotator

It rotates number in circle such that empty spots are filled with bits shifted off the other end.

11001 ROR 2 = 01110; 11001 ROL 2 = 00111;

A N-bit shifter can be built from N N:1 multiplexers. The input is shifted by 0 to $N-1$ bits, depending on the value of the $\log_2N$ -bit select lines (shamt). Figure 5.16 shows the symbol and hardware of 4-bit shifters. The operators <<, >>, and >>> typically indicate shift left, logical shift right, and arithmetic shift right, respectively.

Image Explanation

For the Figure 5.16(b), which is about srl (shift right logical), we can divide it into four cases
1. shamt = 00 (no shift), then output $Y_{[3:0]}=A_{[3:0]}$ .
2. shamt = 01 (srl 1 bit), then output $Y_{[3:0]}=\{0,~A_{[3:1]}\}$
3. shamt = 10 (srl 2 bits), then output $Y_{[3:0]}=\{00,~A_{[3:2]}\}$
4. shamt = 11 (srl 3 bits), then output $Y_{[3:0]}=\{000,~A_3\}$
Such a shifter shown in the Figure 5.16 which can combinationally shift an input by an arbitrary amount is called a barrel shifter.
Shift by a fixed amount (e.g., 1 bit, 2 bits, etc) onlys needs rewiring! It doesn't need any multiplexer.

If we want to use this idea to build a 32-bit shifter, we need 32 32:1 multiplexer, which has a very high hardware usage. To solve this problem, let's introduce the hardware-efficient shifter.

Hardware-efficient Shifter

This hardware-efficient shifter only needs 5 x 32 2:1 multiplexer. And below is the implementation for the 32-bit srl shifter.

This shifter uses a logarithmic, staged design that decomposes any shift amount into a combination of small, power-of-two shifts. Each stage doubles the shift capacity, making the circuit both elegant and efficient.

To add support for sll and sra, the easiest way is to implement a 3-to-1 multiplexer, controlled by the shift type info from the instruction.

module shiftByNPowerOf2
//module Shifter
#(
    parameter i = 0
)  // exponent
(
    input [1:0] Sh,
    input flagShift,
    input [31:0] ShTempIn,
    output reg [31:0] ShTempOut
);

    always @(Sh, ShTempIn, flagShift) begin
        if (flagShift)
            case (Sh)
                2'b00:   ShTempOut = {ShTempIn[31-2**i:0], {2 ** i{1'b0}}};  // SLL
                2'b10:   ShTempOut = {{2 ** i{1'b0}}, ShTempIn[31:2**i]};  // SRL    
                2'b11:   ShTempOut = {{2 ** i{ShTempIn[31]}}, ShTempIn[31:2**i]};  // SRA
                //2'b01: ShTempOut = { ShTempIn[2**i-1:0], ShTempIn[31:2**i] } ;  	// ROR is not supported by RISC-V
                default: ShTempOut = ShTempIn;  // invalid
            endcase
        else ShTempOut = ShTempIn;
    end

endmodule

However, more efficient ways to combine the shifts exist though. For example, when $\text{shamt5}_4=1$ , the input can be from another small "multiplexer" that gives

{16{0}. ShIn_31:16} for srl
{16{ShIn₃₁}. ShIn_31:16} for sra

Hint: To implement this, we only need 1 1-bit 2:1 multiplexer to select whether the MSB should be 0 or ShIn₃₁ and then duplicate it 16 times, we can get the result we want.

// Improved code for uploading!

Multiplication

In the previous part, we have seen that multiplication is nothing but shift then add. By shifting, we form the partial products.

In general, a NxN multiplier multiplies two N-bit numbers and produces a 2N-bit result. Multiplication of 1-bit binary numbers is equivalent to the AND operation, so AND gates are used to form the partial products.

Array Multiplier

Figure 5.18 shows the symbol, function, and implementation of an unsigned 4x4 multiplier.

The unsigned multiplier receives the multiplicand and multiplier, $A$ and $B$ , and produces the product $P$ . Figure 5.18(b) shows how partial products are formed. Each partial product is a single multipler bit ( $B_3,B_2,B_1,\text{ or }B_0$ ) AND the mutiplicand bits ( $A_3,A_2,A_1,A_0$ ). With $N$ -bit operands, there are $N$ partial products and $N-1$ stages of 1-bit adders. For example, for a 4x4 multiplier, the partial product of the first row is $B_0$ AND ( $A_3,A_2,A_1,A_0$ ). This partial product is added to the shifted second partial product, $B_1$ AND ( $A_3,A_2,A_1,A_0$ ). Subsequent rows of AND gates and adders form and add the remaining partial products.

This array multiplier is combinational because the whole multiplication (all ANDs and adds) happens in one clock cycle. Maybe this video will be good on explaining the critical path and hardware analysis of the array multiplier.

The pros and cons of using array multiplier

In the array multiplier, we have the following two advantages

the multiplication is done in one cycle, thus it uses less cycle.
the bits are constructed in parallel by using the AND gates and then "ripple" through the adder. Thus it provides better parallelizability.

However, an array multiplier has the following disadvantages

The clock cycle depends on the critical path, which may thus slower the clock speed.
The hardware usage is high and increases in O(n²).

Sequential Multiplier

Suppose that we may not want to do the mutiplication in one single cycle, we can use the sequential multiplier. Its workflow is given as follows,

To understand it better, let's step through an example. Let S=000000, A=011, B=101 (initial values)

$B_0=1$ (original $B_0=1$ )
1. add: S = S + A = 000000 + 000011 = 000011
2. A = A << 1; Thus A = 000110
3. B = B >> 1; Thus B = 010
$B_0=0$ (original $B_1=0$ )
1. no addition required here
2. A = A << 1; Thus A = 001100
3. B = B >> 1; Thus B = 001
$B_0=1$ (original $B_2=1$ )
1. add: S = S + A = 000011 + 001100 = 001111

The hardware complexity increases linearly as we increase the bits to be multiplied.

Improved Sequential Multiplier

The idea is to always align $A$ to the "high-side" (3 most significant bits of S in 3-bit multiplication) and then perform the addition. Then shift S to the right and do not shift A. And the least significant bits of S populated with B, which keeps getting shifted to the right with S (avoids the need for an extra register for B). Its workflow is shown as follows,

Again, stepping through an example will make it easier to understand. Let S = 0000101 (000B), A = 011 (initial values)

$S_0=1$ (original $B_0=1$ )
1. add S = S + A000 = 000101 + 011000 = 011101
2. S = S >> 1; Thus S = 001110
$S_0=0$ (original $B_1=0$ )
1. no addition required
2. S = S >> 1; Thus S = 000111
$S_0=1$ (original $B_2=1$ )
1. add S = S + A000 = 000111 + 011000 = 011111
2. S = S >> 1; Thus S = 001111, which is the final result

But there is a problem with the improved sequential multiplier, what if the 32-bit ALU generates a carry out? Where should the carry out go?

Ans: The carry-out effectively becomes the new MSB (Bit 63) of the 64-bit Product after the shift occurs.

The trade-off between array multiplier and sequential multipler

Clock speed: As we have seen from above, the array multiplier uses only one cycle but likely the cycle will take longer time, thus slowering the clock speed while the sequential multiplier uses more cycles but likely each cycle takes a shorter time and thus the clock speed is faster.
Hardware cost: The array multiplier usually uses more hardware than the sequential multiplier.

As a hardware designer, we need to always think about this kind of trade-off to achieve what we want. The following table summarizes the cost and latency analysis of a n-bit multiplier built using the carry propagate adder.

Multiplier Type

Cost (Hardware Area)

Latency (Speed)

Sequential Multiplier

$O(n)$ (Linear)

$O(n^2)$ (Quadratic)

Array Multiplier

$O(n^2)$ (Quadratic)

$O(n)$ (Linear)

HDL Implementation

The HDL for a multiplier is in HDL Example 5.4. As with adders, the synthesis tools may pick the most appropriate design given the timing constraints.

Example 5.4 Multiplier

module multiplier #(parameter N = 8)
                   (input  logic [N-1:0]    a, b,
                    output logic [2*N-1:0] y);
  assign y = a * b;
endmodule

Division

Before we continue this section, let's make some conventions first. Given $A\div B = Q+R$ , we say that

$A$ is the dividend
$B$ is the divisor
$Q$ is the quotient
$R$ is the remainder

Binary division can be performed using the following algorithm for $N$ -bit unsigned numbers in the range $[0,~2^N-1]$ .

R' = 0
for i = N-1 to 0
    R = {R' << 1, A_i}
    D = R - B
    if D < 0 then Q_i = 0, R' = R // R < B
    else          Q_i = 1, R' = D // R >= B
R = R'

The partial remainder $R$ is initialized to 0 ( $R'=0$ ), and the most significant bit of the dividend $A$ becomes the least significant bit of $R$ ( $R=\{R'<<1,~ A_i\}$ ).
The divisor $B$ is subtracted from this partial remainder to determine whether it fits ( $D = R-B$ ).
1. If the difference $D$ is negative (e.g., the sign bit of $D$ is 1), then the quotient bit $Q_i$ is 0 and the difference is discarded.
2. Else, the difference $D$ is positive. Then the quotient bit $Q_i$ is 1 and the remainder $R$ becomes $D$ .
3. In any event, the partial remainder is then doubled (left-shited by one column), the next most significant bit of $A$ becomes the least significant bit of $R$ , and the process repeats.
The process (step 2) repeats and the result satisfies $\frac{A}{B}=Q+\frac{R}{B}$ .

$N$ -bit operands yield $N$ -bit quotient and remainder. Here, we want to show that the divisor and quotient and remainder shares the same bit-width, but actually the dividend can have $2N$ bits.

Below is a more straight-forward example,

This approach is called long division approach, which is basically similar to what we have seen above, but in a more straight-forward way

If divisor $\leq$ dividend bits
1. 1 bit in quotient, subtract
Otherwise
1. 0 bit in quotient, bring down next dividend bit

This illustration is more straight-forward, but less complete. Thus it is good for doing the hand-by-hand calculation. But for the systematic approach to implement in the hardware, please go reviewing the pseudocode above.

Array Divider

Figure 5.22 shows a schematic of a 4-bit array divider. The divider computes $A/B$ and produces a quotient $Q$ and a remainder $R$ . The legend shows the symbol and schematic for each block in the array divider.

Each row performs one iteration of the division algorithm. Specifically, each row calculates the difference $D = R-B$ . (Recall that $R+\bar B+1=R-B$ ).
The multiplexer select signal, $N$ ( for Negative), receives 1 when a row's difference $D$ is negative. So $N$ is driven by the most significant bit of $D$ .
Each quotient bit ( $Q_i$ ) is 0 when $D$ is negative and 1 otherwise. The multiplexer passes $R$ to the next row if the difference is negative and $D$ otherwise. (In the legend)
The following row shifts the new partial remainder left by one bit (a good example to show that "shifting is just rewiring"), appends the next most significant bit of $A$ , and then repeats the process.

The delay of an $N$ -bit array divider increases proportionally to $N^2$ because the carry must ripple through all $N$ stages in a row before the sign is determined and the multiplexer can select $R$ or $D$ . This repeats for all $N$ rows.

Sequential Divider

The sequential divider here just follows the pseudocode mentioned above, which is about unsigned division. To convert it to signed, only minor changes are needed. (For more information on this, can go to Lab 03)

Similar to the Sequential Multiplier, we can also do the divider sequentially,

To understand it better, again, let's step through an example. For example, 00000111 (dividend) is divided by 0010 (divisor).

In the Hennessy & Patterson's textbook, they mention an improved version of the division algorithm above, it is shown as follows,

Signed Multiplication/Division

Signed vs. Unsigned Multiplication

When you multiply two n-bit numbers,

the lowest n bits (the least significant half) of the product are identical no matter the numbers are interpreted as signed or unsigned.
Only the upper n bits differ (because sign extension affects those).

That’s why:

MUL (in RISC-V) gives just the lower n bits.
MULH / MULHU (RISC-V) give the upper n bits, for signed/unsigned respectively.

Macro-op fusion

“Macro-op fusion” means the processor can fuse mul and mulh internally and perform a single 64-bit multiply, returning both halves if needed.

We know that RISC-V has separate instructions for multiplication:

mul → lower 32 bits of product
mulh → upper 32 bits (signed × signed)
mulhu → upper 32 bits (unsigned × unsigned)
mulhsu → upper 32 bits (signed × unsigned)

If we implement them literally, it looks like three separate multipliers are needed. But in hardware, we don’t have to do three full multiplications. Instead, we can compute one 64-bit result internally and just output different parts or interpret sign bits differently.

So, even though RISC-V has multiple mul variants, we don’t necessarily perform multiple physical multiplications.

Signed/Unsigned always matters in Division

Unlike multiplication, signed and unsigned division don’t share any bits of result. We can’t reuse the same division result for both. For example,

Dividend

Divisor

Signed?

Result

Notes

signed or unsigned

Same either way

-3

signed

-2

(because sign)

-3

unsigned

invalid (divisor interpreted as 4294967293 if 32-bit!)

-6

signed

-2

okay

-6

unsigned

big number ≈ 0x55555554 (because -6 -> 0xFFFFFFFA interpreted as 4294967290)

So division results sometimes completely change depending on whether you treat numbers as signed or unsigned — even if you use the same bit pattern.

Simple trick for signed multiplication/division

Signed division (and signed multiplication) can be done using the same unsigned operation unit if you:

Record the sign of both operands.
Take their absolute values (convert negatives to positives).
Perform unsigned division (or multiplication).
If exactly one operand was negative, negate the result.

But there are some wierd scenarios. For example, if we want to do the following multiplication,

-2^{31}\times-1=2^{31}

The result cannot fit into a 32-bit register. This is an overflow (same trick to detect overflow as we have discussed in signed addition/subtraction)!

Number Systems

Computers operate on both integers and fractions. So far, we have only considered representing signed or unsigned integers. This section introduces fixed- and floating-point number systems that can represent rational numbers.

Fixed-point numbers are analogous to decimals; some of the bits represent the integer part, and the rest represent the fraction
Floating-point numbers are analogous to scientific notation, with a mantissa and an exponent.

If we have $N$ bits, we can only represent 2^N patterns. Using different number systems won't change the number of patterns, but can change the distribution of the numbers it represents.

Fixed-Point Number Systems

Fixed-point notation has an implied binary point between the integer and fraction bits, analogus to the decimal point between the integer and fraction digits of an ordinary decimal number. For example, Figure 5.23(a) shows a fixed-point number with four integer bits and four fraction bits. Figure 5.23(b) shows the implied binary point in blue, and Figure 5.23(c) shows the equivalent decimal value.

Notes

This notation is not so standard.
The position of the binary point affects the range (difference between the largest and smallest representable numbers) and precision (smallest possible difference between any two numbers)

Signed Fixed-Point

Signed fixed-point numbers can use either two's complement or sign/magnitude notation. Figure 5.24 shows the fixed-point representation of $-2.375$ using both notations with four integer and four fraction bits.

The implicit binary point is shown in blue for clarity.

In sign/magnitude form, the most significant bit is used to indicate the sign.
The two's complement representation is formed by inverting the bits of the absolute value and adding 1 to the least significant (righmost) bit. In this case, the least significant bit position is in the $2^{-4}$ column.
- In this example, we have: 00100110 -> 11011001 -> 11011010

In general, we use Ua.b to denote an unsigned fixed-point number with a integer bits and b fraction bits. Qa.b denotes a signed (two's complement) fixed point number with a integer bits (including the sign bit) and b fractional bits.

A trick to quickly find the binary representation of a fraction using fixed-point notation

First, write the fraction is to $\frac{A}{B}$ , where $A$ is any integer, and $B$ is an integer which is $2^n$ .
Use binary to represent $A$ and find out $n$ .
Move the binary point from behind the rightmost bit $n$ bits forward.

For example, we want to calculate $0.625$ ,

$0.625=\frac{5}{8}=\frac{5}{2^3}$
$5=101$
$0.625=0.101$
Fit into eight bit representation, it will become 0000.1010

Fixed-Point Calculation

Fixed-point addition and subtraction is simple because the position of the binary point won't change. But multiplication and division are a bit tedious. So, here we will only talk about the fixed-point number multiplication and division.

Actually, from the small trick we mentioned above, we've seen that fixed-point notation has an implicit scale factor, which is $2^{\text{number of fraction bits}}$ .

For example, If we have 2 fractional bits, the scale factor is $2^2=4$ . That means every stored integer actually represents $\text{stored/4}$ . So, 00000110₂ = 6 actually means $6/4=1.5$ .

Knowing this, we can clearly see that when we multiply two numbers in fixed-point, both are already scaled. For example, let's compute

Variable

Actual value

Stored (binary)

Decimal equivalent (stored)

1.5

00000110

2.5

00001010

When we do a*b, we will have 00111100₂. Since we scaled each input by $2^2=4$ , the product is scaled by $4*4=16=2^4$ . So to get back to the same scalor factor, which is 2 fractional bits, we must divide the product by 4 (shift right by 2 bits). Thus, we have 00111100₂ >> 2 = 00001111₂. This gives our stored value to be 15 and our actual value to be $15/4=3.75$ .

But, in the above scenario, we assumed a short multiply, e.g., n-bit × n-bit = n-bit.

When we want to do a long multiplication,

an overflow may happen, which basically is that our result bits cannot fit the multiplication result.
an underflow may also happen, which means the multiplication result is representable, but not accurately, because our fractional bits are too few.

Floating-Point Number Systems

Floating-point numbers are analogous to scientific notation. They circumvent the limitation of having a constant number of integer and fraction bits, allowing the representation of very large and very small numbers. Like scientific notation, floating-point numbers have a sign, mantissa (M), base (B), and exponent (E), as shown below:

\pm M\times B^E

For example, the number $4.1×10^3$ is the decimal scientific notation for 4100. It has a mantissa of 4.1, a base of 10, and an exponent of 3. The decimal point floats to the position right after the most significant digit. Floating-point numbers are base 2 with a binary mantissa. 32 bits are used to represent 1 sign bit, 8 exponent bits, and 23 mantissa bits.

The exponent needs to represent both positive and negative exponents. To do so, floating-point uses a biased exponent, which is the original exponent plus a constant bias. In 32-bit floating-point representation, the bias is 127. For example, for the exponent 7, the biased exponent is 7 + 127 = 134 = 10000110₂. For the exponent -4, the biased exponent is -4 + 127 = 123 = 01111011₂.

This notation conforms to the IEEE 754 floating-point standard. In this notation, the first bit of the fraction/mantissa is always 1.

To put it all together, let's examine two examples:

Represent unsigned number in IEEE 754

Suppose we want to represent 228₁₀ = 11100100₂ = 1.11001₂ x 2⁷ using floating point system, it will be as follows,

Represent signed number in IEEE 754

Suppose we want to write -58.25₁₀ in floating point (IEEE 754)

Convert decimal to binary: 58.25₁₀ = 111010.01₂
Write in binary scientific notation: 1.1101001 x 2⁵
Fill in fields: Sign bit = 1, Biased Expoenent = 5 + 127 = 10000100₂, 23 fraction bits = 110 1001 0000 0000 0000 0000.

Special Cases

The IEEE floating-point standard has special cases to represent numbers such as zero, infinity, and illegal results.

More formats

So far, we have examined 32-bit floating-point numbers. This format is also called single-precision, single, or float. The IEEE 754 standard also defines 64-bit double-precision numbers (also called doubles) and 128-bit quadruple-precision numbers (also called quads) that provide greater precision and greater range. Table 5.5 shows the number of bits used for the fields in each format.

Rounding

Arithmetic results that fall outside of the available precision must round to a neighboring number. The rounding modes are

round down,
round up,
round toward zero, and
round to nearest.

The default rounding mode is round to nearest. In the round-to-nearest mode, if two numbers are equally near, the one with a 0 in the least significant position of the fraction is chosen.

For example, we want to round 1.100101 (1.578125) to only 3 fraction bits

round down: 1.100
round up: 1.101
round toward zero: 1.100
round to nearest: 1.101 (1.625 is close to 1.578125 than 1.5 is)

Notes

A number overflows when its magnitude is too large to be represented.
A number underflows when it is too tiny to be represented.

In round-to-nearest mode, overflows are rounded up to $\pm\infty$ and underflows are rounded down to 0.

Floating-Point Calculation

Addition

The steps for adding floating-point numbers with the same sign are as follows:

Extract exponent and fraction bits.
Prepend leading 1 to form the mantissa.
Compare exponents.
Shift smaller mantissa if necessary.
Add mantissas.
Normalize mantissa and adjust exponent if necessary.
Check whether have overflow or underflow then round result.
Assemble exponent and fraction back into floating-point number.

For example,

Notes

For subtraction, just change step 5 to -> Subtract the mantissas.
1. If two mantissa A - B, and A is smaller, just do B - A and add a negative sign to the final result and then normalize the mantissa.
If Step 3 the shift amount is negative, shift the upper mantissa to the right.
Step 7 to check overflow and underflow is important! The idea is to check whether the final result's exponent is between [-126, 127] for single-precision floating point number. If < -126, underflow and if > 127, overflow.

Multiplication

Floating point multiplication follows the steps shown as follows:

For example, using the above steps to multiply the numbers 0.5₁₀ and -0.4375₁₀.

In binary, the task is multiplying 1.000₂ x 2^-1 by -1.110₂ x 2^-2.
Adding the exponents without bias:
1. -1 + -2 = -3. The exponenet will be -3 + 127 = 124
Multiplying the significands:
1. 1.000 x -1.110 = 1.110 x 2^-3
Checking Overflow and underflow:
1. As -3 is within [-127, 126], no overflow and underflow
Rounding the product:
1. No need in this case
Adding the sign to the final product:
1. Since the signs of the original operands differ, make the sign of the product negative. Hence the product is, -1.110 x 2^-3.

Notes

For the division, change step 1 to add bias to the difference of the two exponents. Step 2 is to use long division to divide the two significands.

PreviousDigital Building Blocks NextSequential Building Blocks

Last updated 2 months ago

hashtagAddition

hashtagHalf Adder

hashtagFull Adder

hashtagCarry Propagate Adder

hashtagRipple-Carry Adder

hashtagCarry-Lookahead Adder

hashtagImage Explanation

hashtagPutting it all together

hashtagSubtraction

hashtagSigned Addition/Subtraction

hashtagSigned Addition

hashtagNotes

hashtagSigned Subtraction

hashtagNotes

hashtagTable Explanation

hashtagComparators

hashtagALU

hashtagImage Explanation

hashtagHDL Implementation

hashtagShifters and Rotators

hashtagLogical shifter

hashtagArithmetic shifter

hashtagRotator

hashtagImage Explanation

hashtagHardware-efficient Shifter

hashtagMultiplication

hashtagArray Multiplier

hashtagSequential Multiplier

hashtagImproved Sequential Multiplier

hashtagHDL Implementation

hashtagDivision

hashtagArray Divider

hashtagSequential Divider

hashtagSigned Multiplication/Division

hashtagSigned vs. Unsigned Multiplication

hashtagMacro-op fusion

hashtagSigned/Unsigned always matters in Division

hashtagSimple trick for signed multiplication/division

hashtagNumber Systems

hashtagFixed-Point Number Systems

hashtagNotes

hashtagSigned Fixed-Point

hashtagFixed-Point Calculation

hashtagFloating-Point Number Systems

hashtagRepresent unsigned number in IEEE 754

hashtagRepresent signed number in IEEE 754

hashtagSpecial Cases

hashtagMore formats

hashtagRounding

hashtagNotes

hashtagFloating-Point Calculation

hashtagAddition

hashtagNotes

hashtagMultiplication

hashtagNotes

Addition

Half Adder

Full Adder

Carry Propagate Adder

Ripple-Carry Adder

Carry-Lookahead Adder

Image Explanation

Putting it all together

Subtraction

Signed Addition/Subtraction

Signed Addition

Notes

Signed Subtraction

Notes

Table Explanation

Comparators

ALU

Image Explanation

HDL Implementation

Shifters and Rotators

Logical shifter

Arithmetic shifter

Rotator

Image Explanation

Hardware-efficient Shifter

Multiplication

Array Multiplier

Sequential Multiplier

Improved Sequential Multiplier

HDL Implementation

Division

Array Divider

Sequential Divider

Signed Multiplication/Division

Signed vs. Unsigned Multiplication

Macro-op fusion

Signed/Unsigned always matters in Division

Simple trick for signed multiplication/division

Number Systems

Fixed-Point Number Systems

Notes

Signed Fixed-Point

Fixed-Point Calculation

Floating-Point Number Systems

Represent unsigned number in IEEE 754

Represent signed number in IEEE 754

Special Cases

More formats

Rounding

Notes

Floating-Point Calculation

Addition

Notes

Multiplication

Notes