Lec 02 - Digital System Design and Verilog

Digital System Design

Levels of Abstraction

Different from the abstraction we have seen in Harris & Harris, here, we will talk about the abstraction in digital system design. And below is a simple illustration,

Let's have a detailed look on each level,

1

Algorithm/System Level (Untimed)

At the highest level, the design is expressed as algorithms or functional behavior without worrying about timing.

For example, describing output = (A+B) + (C+D) in a flowchart or C-like pseudocode.

Focus: This part is only on the functionality, not on how many cycles or how it’s implemented in hardware.

2

Register Transfer Level (RTL, Timed)

It is the macroscopic hardware view and is described in terms of data transfers between registers and operations performed by functional units (ALUs, multiplexers, etc.) under clock control. And this macroscopic hardware view is implemented using the RTL code. Thus, it is timed, cycle-accurate, but still abstract (macroscopic).

For example, we have written the RTL Code in CG3207 Lab01, and the following simple Verilog code is also an example of RTL Code

Focus: This is where your macroscopic blocks appear — ALUs, adders, multiplexers, etc. And RTL code is the implementation of these macroscopic blocks.

3

Gate Level

It is the microscopic hardware view. The RTL code is synthesized into logic gates (AND, OR, NOT, flip-flops).

For example,(A+B) becomes a ripple-carry adder built out of AND/OR/XOR gates. It is boolean equations + gates, but no transistor-level details.

Focus: This is your microscopic implementation of RTL macros.

4

Circuit Level

It is the actual electronic implementation of logic gates using CMOS Transistors.

For example, an inverter (NOT gate) is realized using one PMOS and one NMOS transistor.

Focus: Device-level representation, electrical properties like delay, power, capacitance are considered.

5

Layout Level

It is the physical representation of the circuit on silicon. Masks for fabrication are designed here. And it is the final physical placement/routing of transistors and wires.

Simplified FPGA/ASIC Design Flow

The words at the right side of the arrow is the output of its upper step. For example, after "Logic Synthesis", the output is a netlist.

Behavioral Modelling

Behavioral modeling defines the "what" of your FPGA/ASIC design — its high-level logic and algorithms — without hardware details. It’s for verifying functionality early.

  • Purpose: Ensures algorithmic correctness via simulations.

  • Tools:

    • Initial tests with Python, C, Java, Matlab (fast, sequential). e.g., A Python script to simulate a filter: output = input * 0.5 if input > 0 else 0 to test logic.

    • Deeper simulations with VHDL, Verilog, SystemC (handles concurrency and timing, but no cycle accuracy).

  • Not Directly Synthesized: Meant for validation, not hardware generation.

  • HLS Trend: High-Level Synthesis tools (e.g., Vivado HLS) can convert behavioral code to RTL, but effectiveness varies by tool, domain, and needs manual tweaks.

Then, the behavioral code/algo will be fed into Architectural Synthesis in the design flow, making complex problems manageable through iteration.

Architectural Synthesis

Architectural synthesis turns a high-level functional/behavioral (acrhitectural) model into a macroscopic structural (microarchitectural) model for FPGAs/ASICs. It’s mostly manual but becoming more automated.

In short, architectural synthesis is just writing RTL Code or get the schematic (macroscopic model). But to write good RTL Code, we first need to have a clear macroscopic model/diagram. (See RISC-V microarchitecture from Lec 03 as an example)

  • Purpose: Converts abstract logic into a cycle-accurate, synthesizable RTL code, typically with structural and behavioral elements.

  • Output: A block-level model where operations are timed and assigned to hardware blocks. Example: From Z = (A+B) * (C+D) * E, it creates a plan with adders and multipliers.

  • Key Steps:

    • Scheduling (time, or when do we do the operation): Assigns operations to clock cycles (an integer). Finding all the clock cycles (integers) is called solving the scheduling problem. e.g., (A+B) in cycle #1, (C+D) in cycle #2.

    • Binding (space, or where do we do the operation): Maps operations to specific hardware resources, like function units, memories or interconnects. e.g., (A+B) done by ALU #1, (C+D) by ALU #2.

    • Flexibility: Adders can reuse for different pairs with multiplexers. e.g., ALU #1 adds A+B in cycle #1, then E+F in cycle #2 if inputs switch.

  • Tools: RTL synthesis infers register transfers and generates a netlist if guidelines are followed.

The following images shows the difference between behavioural modelling and architectural synthesis.

For now, we should be able to use the hardware thinking to write RTL Code, let's recap on the necessary steps by looking through a very simple step!

1

Behavioral Modelling

Describes what the computation is, but not when or where it happens. This is what we have seen in Behavioral Modelling. For example, we want to calculate the following formula,

Z=(A+B)×(C+D)×EZ=(A+B)\times(C+D)\times E
2

Build a macroscopic model

Now, we start the Architectural Synthesis, where we first decide what kinds of resources will be in the datapath (e.g., 2 adders, 1 multiplier, 1 register file). Still no cycle-by-cycle schedule, just “these are the building blocks.” And this is the high-level datapath architecture.

3

Scheduling

This is done manually by deciding when each operation executes — e.g., in which clock cycle. For example, we want

  • Cycle 1: compute A+B

  • Cycle 2: compute C+D

  • Cycle 3: multiply results → (A+B)∗(C+D)

  • Cycle 4: multiply with E

Attention

4

Binding

This is still done manually. Once we know when things happen, we decide where they happen — e.g., which hardware resource executes each operation. For example,

  • Cycle 1: (A+B) → Adder #1

  • Cycle 2: (C+D) → Adder #1 again (reused)

  • Cycle 3: sum1 * sum2 → Multiplier #1

  • Cycle 4: prod * E → Multiplier #1 again

Attention

5

RTL Coding

The final step is to write the RTL Code, which is also the last step of Architectural Synthesis. Once scheduling + binding are decided, the RTL code can be written, cycle-accurate. For example,

Logical Synthesis

1

Workflow

As shown in the diagram below, the logic synthesis will take in three things (HDL Code, Constraints, and Technology Library) and output one thing (mapped schematic)

  • Technology Library: The cells / microscopic building blocks we are allowed to use. Its purpose is to enable logic synthesis tools to map a design into the physical hardware efficiently while respecting the process technology constraints (timing, power, area).

    • For ASICs, cells are usually gates or gate combinations. (e.g., flip-flops, latches and buffers, etc) They are custom designed and characterized carefully by the foundry while respecting the physical limitations of the specific process technology.

    • For FPGAs, technology library is composed of higher-level CLB functions (like adders, multipliers, LUTs, etc.) but still considered basic elements for synthesis.

  • Mapped Schematic

    • Optimized schematic realizing the HDL code, using building blocks from the technology library.

    • Usually a netlist that textually describes the interconnection between cells/building blocks

  • Constraints

    • Location: Logical port to physical pin mapping etc. (See from CS2100DE Lab 01)

    • Timing specifications (optimization goals): Different schematics can be obtained from the same HDL code.

2

Substeps

  1. Logic optimization: minimize undesirable redundancies (think karnaugh-maps), and hence, the cost and complexity of the design.

  1. Technology mapping (library binding): map logic/hardware resources in macroscopic diagram to cells/building blocks in the library. This actually answers "Which actual silicon component from the library will implement this block?".

3

RTL and Registers

RTL (Register Transfer Level): Describes how data moves between registers on each clock cycle.

  • Registers are like “brick walls” — they hold state and define the boundaries of combinational logic.

  • Everything between registers is combinational logic (no memory, just logic operations).

RTL Synthesis/Logical synthesis tools: Convert your RTL into technology-mapped gates/cells (ASIC) or LUTs/CLBs (FPGA).

  • What gets optimized:

    • Only the combinational logic between registers.

    • Optimization can focus on:

      • Speed: minimize propagation delay

      • Area/cost: minimize number of gates or LUTs

  • Registers themselves are not “optimized” in terms of logic — they just store values.

4

Here, we add one point from Harris & Harris, that is,

Critical path = combinational path with maximum delay.

From Step 3 above, we see that between registers, there is only combinational logic. Hence, the critical path will determine the max clock in our circuit! The following image contains two examples,

If the clock is too fast and the longest "road" (critical path) hasn’t been traversed yet, the data won’t arrive in time, causing incorrect data at the next register. So, the maximum clock frequency is limited by the critical path delay.

Technology Library vs. Intellectual Property (IP)

In short, Technology Library is much more low-level. But IP is

A pre-designed, reusable functional block or module that performs a specific higher-level function. Can be soft (synthesizable RTL) or hard (physical implementation).

And IP is usually much higher-level than technology library — often entire subsystems, like a CPU core, USB controller, or memory controller. For example,

  • CPU cores (RISC-V, ARM Cortex-M)

  • PCIe or Ethernet controllers

  • DSP blocks for signal processing

  • Floating-point arithmetic units


Analogy:

  • Think of technology libraries as bricks and mortar.

  • IP blocks are like pre-built rooms or furniture made from those bricks. You can assemble your building (chip) faster if you have ready-made rooms rather than making everything brick by brick.

Physical Design

1

Placement

It is deciding where to put each cell (gate, flip-flop, or higher-level module) on the physical chip layout.

  • Goal: Place cells that communicate a lot closer together. As shorter interconnects will deduce lower delay and lower power.

  • Tradeoff: If you bring two blocks closer, some other blocks must move farther. This is a global optimization problem.

Placement minimizes some function of a coarse approximation of wirelengths.

Why not exact wire lengths?

Because exact wirelength calculation requires routing every connection, which is very computationally expensive.

2

Routing

It is mapping of logical connection between cells to physical interconnects, which is to physically connect all the cells with wires according to the netlist.

  • Goal: Make wires as short as possible. As this reduces delay, power, and congestion.

  • Constraint: You need placement first to know where to draw the wires.

Why Placement and Routing is a chicken-egg problem?

This is because

  • Optimal placement depends on routing (because wire lengths affect timing).

  • Optimal routing depends on placement (because you need cell locations to route).

And our solution in practice is to use iterative approaches:

  1. Place cells using approximate wirelengths.

  2. Route the design.

  3. Evaluate timing and congestion.

  4. Adjust placement and repeat.

Some Notes

  1. Always Think Hardware — have the topology of the circuit in mind (the macroscopic model), and write RTL code which will imply/infer the topology from the code.

  2. Simulation and synthesis tools work very differently (Add more notes in Harris & Harris)

    1. Simulation "executes" HDL code following the semantics of the HDL

    2. Synthesis "infers the hardware structure" described by HDL (RTL) code

  3. Read this awesome note from sunburst!

FPGA

We have learned quite a lot about FPGA in Harris & Harris. Please go back and review the working principle of FPGA FYI. The purpose of this section is to get us know ASIC vs. FPGA, what are the use case for each of them. So, in short,

  • FPGA: flexible, fast to deploy, good for prototyping and low-volume/highly-custom tasks.

  • ASIC: high-performance, power/area-efficient, cost-effective only for large-scale production.

Verilog for Synthesis

General Rules for Synthesizability

In this section, reg means the variable type is reg only. It doesn't mean that signal is a register. If a reg signal is inferred as a physical register, we will mention it explicitly.

Do NOT use delays (#delay)

Combinational (propagation) delays are hardware dependent; not something the synthesis tool can insert based on HDL code. So, the following code and the use of delay mentioned here are not recommended,

In RTL Verilog code, we insert clock cycle delays explicity by introducing a physical register (e.g., it is a part of your design). So, the following is the correct code for the above,

So, the key takeaway is

  • #delay is for testbenches only.

  • In real FPGA/ASIC design, delays are achieved via registers (clock cycles) and timing constraints (.xdc).

Use one clock for the entire design

  1. Connect the input of every sequential element to this only one clock

    1. Run different things on different speed, instead of using different clocks. (One example is the Clock_Enable module from CG3207 Lab01)

  2. The clock should not come from a combinational circuit, as a combinational circuit can have glitches in its output.

Another example is that "do not use something like @(posedge button) for detecting a transition".

  • Use a synchronous edge detection scheme instead — e.g., by comparing the current value with the previous value stored in a register.

Do not have combinational feedback path

Every circular assignment should be broken by a register (an assignment in a synchronous always block). This is what we have seen in Harris and Harris.

regs and wires should not have multiple drivers

  • A wire should appear on the LHS of only one assign statement.

  • A reg should appear on the LHS of only one always block. -> Cannot appear in more than one always block.

For example, the following is wrong

But its' ok to have a reg at the LHS of multiple statements within the same always block as long as the same type of assignment is used. e.g., the use of if statements.

  • Only blocking (=) or only non-blocking (<=), do not mix the two for a particular reg.

  • Within an always block, if a signal is assigned more than once, whichever assignment executes last in the flow of control is what the reg ends up holding.

Initializations of reg are ignored by synthesis

If you use either initial block or reg declaration (e.g., reg = 1'b0) to initialize a register, this initialization will be useful for registers and memories for FPGAs in simulation, but are ignored by the synthesis tools.

1

Wires cannot be initialized

wires cannot be meaningfully initialized as they don't store anything. (Go back review the working principle of wire again if you forget)

Initialization to 0 or 1 will connect the wire to a constant 0 or 1 respectively. Further assignment using assign will cause it to have multiple drivers. This is dangerous and not recommended!

2

Not all regs can be initialized

regs that are synthesized as combinational circuits cannot be meaningfully initialized.

Here, z is not a flip-flop — it’s just a combinational output of a & b. The = 1'b1 initialization only affects simulation; in real hardware it’s ignored. So on power-up, z won’t magically start at 1. It will simply depend on a & b.

3

Do not assume registers and memory have 0s as initial value

Every always should follow one of the three templates given below strictly

This part is very similar to the content in Harris & Harris. But with more tips added.

1

Purely combinational (always_comb in SystemVerilog)

To write RTL for more complex combinational circuit, use

  1. use always @(*)

  2. blocking statements (=)

These are the same as Harris & Harris, besides that, we also recommend that

  1. Every reg must be assigned a meaningful value (not something like Z <= Z;) for every possible combination of inputs (e.g., all branches of if/case statements). Otherwise, that reg will become physical register and it is no longer a combinational circuit anymore. For example, the following is correct

  2. Two cases with blocking and non-blocking statements in always @(*)

    1. Blocking executes immediately, in order. So if a reg is used on both left-hand side (LHS) and right-hand side (RHS), you should assign it first before you read it. (that reg should appear on LHS before RHS)

    2. Non-blocking executes in parallel at the end of the clock edge, so within the same block you’ll never see the updated value in the RHS. For example, in the following code snippet, Z in Line 5 will hold the old value of Z.

In short, rule 1 and rule 2(a) are important! For the rule 2(b), just never use non-blocking statement in always @(*).

2

Sychronous (always_ff)

Synchronous means that the output changes only on the rising or falling edge of a single clock. And in Harris & Harris, we have seen two rules of thumb:

  1. use always @(posedge clk)

  2. non-blocking assignments (<=) only

By keeping the above two rules, you should be able to avoid 99% of problems. But the remaining 1%, will probably be in the paper exam. 😂

  1. Use non-blocking assignments for the outputs of the always block (signals), as well as for any internal physical registers. But the updated values are not available for use at the same clock edge. (See Step 1, rule 2(b) example)

  2. If you insist on using blocking assignments for internal combinational parts (variables). In this case, the variable should appear on the LHS before RHS.

    1. However, it is recommended to move combinational parts into a separate always block or assign statement so that the original block just becomes a register.

3

Synchronous with asynchronous set/reset

This usually involves a resettable register which we've seen a lot in Harris & Harris. To design such resettable register, we should keep the following rules in mind

  1. Inside the if statement (to reset the value in the register), output should be assigned a vector of 0's and 1's, and nothing more should be done.

  2. All other code (e.g., the synchronous portion) should be inside the else (begin and end of else). There should not be additional outer if/else ifs, but can have inner if/else ifs, but these ifs need not have else, why?

    1. It’s because it’s synchronous. In clocked logic, missing else = “hold value,” which is fine. But in combinational logic, missing else = “need to remember,” which infers unintended latches. (This is super important!)

When are physical regs inferred

In Verilog, reg doesn't mean it is a physical registers. The following rules summarize when are physiscal registers actually inferred

  1. non-blocking assignments of regs in a synchronous always block

  2. In an always @(posedge clk), if you use blocking assignment (=) to read a reg (RHS) before writing it (LHS), that reg is inferred as a physical register.

Basically, the registers will be simplified if they fall into the following two cases

  1. If there are two registers storing identical content, they could be merged by the synthesis tool as a part of optimization (configurable)

  2. If a register content is not used in a parent module, it is optimized away, along with the combinational circuits exclusively feeding it.

For example, given the following Verilog code, draw the schematic,

To solve this kind of drawing schematic questions, we do it systematically

  1. Find the registers: Based on the two rules listed above, here D and A are register.

  2. Start building/drawing the circuit in order: We start from the first line, which is the assign statement to build the circuit

    1. Line 1 is an AND gate

    2. Line 2 just connect the output of the register, which is A to B

    3. Line C is just an XOR gate

    4. Line D connects the C to a register and its output is D

    5. E is just C

    6. So, C is directly connected to the input to the register whose output is A

The final schematic looks like below,

Finite State Machines

FSMs are introduced in DDCA in pretty much detail, this part is from Prof's slides

A standard state machine architecture consists of three distinct components:

  1. combinational next-state logic,

  2. sequential state memory, and

  3. combinational output logic .

For ASIC designs, a reset input is essential to initialize the state memory upon power-up; however, this is unnecessary in FPGA designs because dedicated Global Set/Reset (GSR) circuitry automatically initializes registers to their default values.

Basic FSM

To implement the above FSM, we can use the verilog template as follows:

There are two types of FSMs:

  1. Moore Machines

  2. Mealy Machines

The distinction between these models lies in their output dependencies.

  • In a Mealy Machine, the output is a function of both the current state and the current input. Conversely,

  • in a Moore Machine, the output depends exclusively on the current state.

For example, the following are the next state and output example tables for the moore and mealy machines:

Moore and Mealy machines example

Moore Machine

To implement a moore machine, we can use the following two approaches.

1

Three-block approach

This approach clearly modularizes the design into three separate blocks:

  1. a combinational block for next-state generation,

  2. a second combinational block for output generation, and

  3. a sequential block for the state memory.

This separation ensures that the output logic remains independent of the input signals, strictly adhering to the Moore definition.

Moore Machine in Verilog
2

Combined Combinational Logic

Alternatively, the next-state generation and output generation can be merged into a single combinational block. This method can potentially lead to hardware savings if the synthesis tool optimizes the shared combinational logic between the two functions.

Moore machine in Verilog (combined logic)

Mealy Machine

For a Mealy machine, the output assignment must be placed inside the input conditional checks within the combinational block. Since the output is dependent on the input, any change in input signals will immediately propagate to the output asynchronously, unlike the registered behavior of the state memory.

Worth to Read

Last updated