Lec 02 - Digital System Design and Verilog
Digital System Design
Levels of Abstraction
Different from the abstraction we have seen in Harris & Harris, here, we will talk about the abstraction in digital system design. And below is a simple illustration,

Let's have a detailed look on each level,
Algorithm/System Level (Untimed)
At the highest level, the design is expressed as algorithms or functional behavior without worrying about timing.
For example, describing output = (A+B) + (C+D) in a flowchart or C-like pseudocode.
Focus: This part is only on the functionality, not on how many cycles or how it’s implemented in hardware.
Register Transfer Level (RTL, Timed)
It is the macroscopic hardware view and is described in terms of data transfers between registers and operations performed by functional units (ALUs, multiplexers, etc.) under clock control. And this macroscopic hardware view is implemented using the RTL code. Thus, it is timed, cycle-accurate, but still abstract (macroscopic).
For example, we have written the RTL Code in CG3207 Lab01, and the following simple Verilog code is also an example of RTL Code
Focus: This is where your macroscopic blocks appear — ALUs, adders, multiplexers, etc. And RTL code is the implementation of these macroscopic blocks.
Gate Level
It is the microscopic hardware view. The RTL code is synthesized into logic gates (AND, OR, NOT, flip-flops).
For example,(A+B) becomes a ripple-carry adder built out of AND/OR/XOR gates. It is boolean equations + gates, but no transistor-level details.
Focus: This is your microscopic implementation of RTL macros.
Circuit Level
It is the actual electronic implementation of logic gates using CMOS Transistors.
For example, an inverter (NOT gate) is realized using one PMOS and one NMOS transistor.
Focus: Device-level representation, electrical properties like delay, power, capacitance are considered.
Simplified FPGA/ASIC Design Flow

Behavioral Modelling
Behavioral modeling defines the "what" of your FPGA/ASIC design — its high-level logic and algorithms — without hardware details. It’s for verifying functionality early.
Purpose: Ensures algorithmic correctness via simulations.
Tools:
Initial tests with Python, C, Java, Matlab (fast, sequential). e.g., A Python script to simulate a filter:
output = input * 0.5 if input > 0 else 0to test logic.Deeper simulations with VHDL, Verilog, SystemC (handles concurrency and timing, but no cycle accuracy).
Not Directly Synthesized: Meant for validation, not hardware generation.
HLS Trend: High-Level Synthesis tools (e.g., Vivado HLS) can convert behavioral code to RTL, but effectiveness varies by tool, domain, and needs manual tweaks.
Then, the behavioral code/algo will be fed into Architectural Synthesis in the design flow, making complex problems manageable through iteration.
Architectural Synthesis
Architectural synthesis turns a high-level functional/behavioral (acrhitectural) model into a macroscopic structural (microarchitectural) model for FPGAs/ASICs. It’s mostly manual but becoming more automated.
In short, architectural synthesis is just writing RTL Code or get the schematic (macroscopic model). But to write good RTL Code, we first need to have a clear macroscopic model/diagram. (See RISC-V microarchitecture from Lec 03 as an example)
Purpose: Converts abstract logic into a cycle-accurate, synthesizable RTL code, typically with structural and behavioral elements.
Output: A block-level model where operations are timed and assigned to hardware blocks. Example: From
Z = (A+B) * (C+D) * E, it creates a plan with adders and multipliers.Key Steps:
Scheduling (time, or when do we do the operation): Assigns operations to clock cycles (an integer). Finding all the clock cycles (integers) is called solving the scheduling problem. e.g.,
(A+B)in cycle #1,(C+D)in cycle #2.Binding (space, or where do we do the operation): Maps operations to specific hardware resources, like function units, memories or interconnects. e.g.,
(A+B)done by ALU #1,(C+D)by ALU #2.Flexibility: Adders can reuse for different pairs with multiplexers. e.g., ALU #1 adds
A+Bin cycle #1, thenE+Fin cycle #2 if inputs switch.
Tools: RTL synthesis infers register transfers and generates a netlist if guidelines are followed.
The following images shows the difference between behavioural modelling and architectural synthesis.

Binding directly determines how many actual FUs are instantiated in the final design, which in turn determines the area cost. Thus, architectural synthesis has the largest impact of the cost of building a chip.
For now, we should be able to use the hardware thinking to write RTL Code, let's recap on the necessary steps by looking through a very simple step!
Behavioral Modelling
Describes what the computation is, but not when or where it happens. This is what we have seen in Behavioral Modelling. For example, we want to calculate the following formula,
Build a macroscopic model
Now, we start the Architectural Synthesis, where we first decide what kinds of resources will be in the datapath (e.g., 2 adders, 1 multiplier, 1 register file). Still no cycle-by-cycle schedule, just “these are the building blocks.” And this is the high-level datapath architecture.
Scheduling
This is done manually by deciding when each operation executes — e.g., in which clock cycle. For example, we want
Cycle 1: compute A+B
Cycle 2: compute C+D
Cycle 3: multiply results → (A+B)∗(C+D)
Cycle 4: multiply with E
Binding
This is still done manually. Once we know when things happen, we decide where they happen — e.g., which hardware resource executes each operation. For example,
Cycle 1:
(A+B)→ Adder #1Cycle 2:
(C+D)→ Adder #1 again (reused)Cycle 3:
sum1 * sum2→ Multiplier #1Cycle 4:
prod * E→ Multiplier #1 again
RTL Coding
The final step is to write the RTL Code, which is also the last step of Architectural Synthesis. Once scheduling + binding are decided, the RTL code can be written, cycle-accurate. For example,
These 5 steps are important and usually people will ignore the macroscopic block part, which is to think hardware. This will be dangerous. As macroscopic block will be useful when we check our sysnthesis report to see whether the hardware after sysnthesis is the same as what we want to build.
Logical Synthesis
Workflow
As shown in the diagram below, the logic synthesis will take in three things (HDL Code, Constraints, and Technology Library) and output one thing (mapped schematic)

Technology Library: The cells / microscopic building blocks we are allowed to use. Its purpose is to enable logic synthesis tools to map a design into the physical hardware efficiently while respecting the process technology constraints (timing, power, area).
For ASICs, cells are usually gates or gate combinations. (e.g., flip-flops, latches and buffers, etc) They are custom designed and characterized carefully by the foundry while respecting the physical limitations of the specific process technology.
For FPGAs, technology library is composed of higher-level CLB functions (like adders, multipliers, LUTs, etc.) but still considered basic elements for synthesis.
Mapped Schematic
Optimized schematic realizing the HDL code, using building blocks from the technology library.
Usually a netlist that textually describes the interconnection between cells/building blocks
Constraints
Location: Logical port to physical pin mapping etc. (See from CS2100DE Lab 01)
Timing specifications (optimization goals): Different schematics can be obtained from the same HDL code.
Substeps
Logic optimization: minimize undesirable redundancies (think karnaugh-maps), and hence, the cost and complexity of the design.

Technology mapping (library binding): map logic/hardware resources in macroscopic diagram to cells/building blocks in the library. This actually answers "Which actual silicon component from the library will implement this block?".

RTL and Registers
RTL (Register Transfer Level): Describes how data moves between registers on each clock cycle.
Registers are like “brick walls” — they hold state and define the boundaries of combinational logic.
Everything between registers is combinational logic (no memory, just logic operations).
RTL Synthesis/Logical synthesis tools: Convert your RTL into technology-mapped gates/cells (ASIC) or LUTs/CLBs (FPGA).
What gets optimized:
Only the combinational logic between registers.
Optimization can focus on:
Speed: minimize propagation delay
Area/cost: minimize number of gates or LUTs
Registers themselves are not “optimized” in terms of logic — they just store values.
Here, we add one point from Harris & Harris, that is,
Critical path = combinational path with maximum delay.
From Step 3 above, we see that between registers, there is only combinational logic. Hence, the critical path will determine the max clock in our circuit! The following image contains two examples,

If the clock is too fast and the longest "road" (critical path) hasn’t been traversed yet, the data won’t arrive in time, causing incorrect data at the next register. So, the maximum clock frequency is limited by the critical path delay.
Physical Design
Placement
It is deciding where to put each cell (gate, flip-flop, or higher-level module) on the physical chip layout.
Goal: Place cells that communicate a lot closer together. As shorter interconnects will deduce lower delay and lower power.
Tradeoff: If you bring two blocks closer, some other blocks must move farther. This is a global optimization problem.
Placement minimizes some function of a coarse approximation of wirelengths.
Why not exact wire lengths?
Because exact wirelength calculation requires routing every connection, which is very computationally expensive.
Routing
It is mapping of logical connection between cells to physical interconnects, which is to physically connect all the cells with wires according to the netlist.
Goal: Make wires as short as possible. As this reduces delay, power, and congestion.
Constraint: You need placement first to know where to draw the wires.
For more on routing, please take NUS EE4218!
— From Prof. Rajesh
Some Notes
Always Think Hardware — have the topology of the circuit in mind (the macroscopic model), and write RTL code which will imply/infer the topology from the code.
Simulation and synthesis tools work very differently (Add more notes in Harris & Harris)
Simulation "executes" HDL code following the semantics of the HDL
Synthesis "infers the hardware structure" described by HDL (RTL) code
Read this awesome note from sunburst!
FPGA
We have learned quite a lot about FPGA in Harris & Harris. Please go back and review the working principle of FPGA FYI. The purpose of this section is to get us know ASIC vs. FPGA, what are the use case for each of them. So, in short,
FPGA: flexible, fast to deploy, good for prototyping and low-volume/highly-custom tasks.
ASIC: high-performance, power/area-efficient, cost-effective only for large-scale production.
Verilog for Synthesis
Note that the verilog and the rules we are talking here are for writing the RTL code, not for the testbench/simulation.
General Rules for Synthesizability
In this section,
regmeans the variable type isregonly. It doesn't mean that signal is a register. If aregsignal is inferred as a physical register, we will mention it explicitly.
Do NOT use delays (#delay)
#delay)Combinational (propagation) delays are hardware dependent; not something the synthesis tool can insert based on HDL code. So, the following code and the use of delay mentioned here are not recommended,
In RTL Verilog code, we insert clock cycle delays explicity by introducing a physical register (e.g., it is a part of your design). So, the following is the correct code for the above,
So, the key takeaway is
#delayis for testbenches only.In real FPGA/ASIC design, delays are achieved via registers (clock cycles) and timing constraints (.xdc).
Use one clock for the entire design
Connect the input of every sequential element to this only one clock
Run different things on different speed, instead of using different clocks. (One example is the
Clock_Enablemodule from CG3207 Lab01)
The clock should not come from a combinational circuit, as a combinational circuit can have glitches in its output.

In the above image, the glitch happens because NOT gate has a propagation delay.
Another example is that "do not use something like @(posedge button) for detecting a transition".
Use a synchronous edge detection scheme instead — e.g., by comparing the current value with the previous value stored in a register.

Do not have combinational feedback path
Every circular assignment should be broken by a register (an assignment in a
synchronous always block). This is what we have seen in Harris and Harris.

regs and wires should not have multiple drivers
regs and wires should not have multiple driversA
wireshould appear on the LHS of only oneassignstatement.A
regshould appear on the LHS of only onealwaysblock. -> Cannot appear in more than onealwaysblock.
For example, the following is wrong
But its' ok to have a reg at the LHS of multiple statements within the same
always block as long as the same type of assignment is used. e.g., the use of if statements.
Only blocking (
=) or only non-blocking (<=), do not mix the two for a particularreg.Within an always block, if a signal is assigned more than once, whichever assignment executes last in the flow of control is what the
regends up holding.
Initializations of reg are ignored by synthesis
If you use either initial block or reg declaration (e.g., reg = 1'b0) to initialize a register, this initialization will be useful for registers and memories for FPGAs in simulation, but are ignored by the synthesis tools.
Wires cannot be initialized
wires cannot be meaningfully initialized as they don't store anything. (Go back review the working principle of wire again if you forget)
Initialization to 0 or 1 will connect the wire to a constant 0 or 1 respectively.
Further assignment using assign will cause it to have multiple drivers. This is dangerous and not recommended!
Not all regs can be initialized
regs that are synthesized as combinational circuits cannot be meaningfully initialized.
Here, z is not a flip-flop — it’s just a combinational output of a & b. The = 1'b1 initialization only affects simulation; in real hardware it’s ignored. So on power-up, z won’t magically start at 1. It will simply depend on a & b.
Not all regs will be inferred as physical registers.
Every always should follow one of the three templates given below strictly
This part is very similar to the content in Harris & Harris. But with more tips added.
Purely combinational (always_comb in SystemVerilog)
To write RTL for more complex combinational circuit, use
use
always @(*)blocking statements (
=)
These are the same as Harris & Harris, besides that, we also recommend that
Every
regmust be assigned a meaningful value (not something likeZ <= Z;) for every possible combination of inputs (e.g., all branches ofif/casestatements). Otherwise, thatregwill become physical register and it is no longer a combinational circuit anymore. For example, the following is correctTwo cases with blocking and non-blocking statements in
always @(*)Blocking executes immediately, in order. So if a
regis used on both left-hand side (LHS) and right-hand side (RHS), you should assign it first before you read it. (thatregshould appear on LHS before RHS)Non-blocking executes in parallel at the end of the clock edge, so within the same block you’ll never see the updated value in the RHS. For example, in the following code snippet,
Zin Line 5 will hold the old value ofZ.
In short, rule 1 and rule 2(a) are important! For the rule 2(b), just never use non-blocking statement in always @(*).
Sychronous (always_ff)
Synchronous means that the output changes only on the rising or falling edge of a single clock. And in Harris & Harris, we have seen two rules of thumb:
use
always @(posedge clk)non-blocking assignments (
<=) only
By keeping the above two rules, you should be able to avoid 99% of problems. But the remaining 1%, will probably be in the paper exam. 😂
Use non-blocking assignments for the outputs of the always block (signals), as well as for any internal physical registers. But the updated values are not available for use at the same clock edge. (See Step 1, rule 2(b) example)
If you insist on using blocking assignments for internal combinational parts (variables). In this case, the variable should appear on the LHS before RHS.
However, it is recommended to move combinational parts into a separate
alwaysblock orassignstatement so that the original block just becomes a register.
Synchronous with asynchronous set/reset
This usually involves a resettable register which we've seen a lot in Harris & Harris. To design such resettable register, we should keep the following rules in mind
Inside the
ifstatement (to reset the value in the register), output should be assigned a vector of 0's and 1's, and nothing more should be done.All other code (e.g., the synchronous portion) should be inside the
else(begin and end of else). There should not be additional outerif/else ifs, but can have innerif/else ifs, but theseifs need not haveelse, why?It’s because it’s synchronous. In clocked logic, missing
else= “hold value,” which is fine. But in combinational logic, missingelse= “need to remember,” which infers unintended latches. (This is super important!)
In short, Prof. Rajesh recommends to never use resettable registers in this course.
When are physical regs inferred
In Verilog, reg doesn't mean it is a physical registers. The following rules summarize when are physiscal registers actually inferred
non-blocking assignments of
regs in a synchronousalwaysblockIn an
always @(posedge clk), if you use blocking assignment (=) to read a reg (RHS) before writing it (LHS), thatregis inferred as a physical register.
Inferring using the second way is not recommended!
Basically, the registers will be simplified if they fall into the following two cases
If there are two registers storing identical content, they could be merged by the synthesis tool as a part of optimization (configurable)
If a register content is not used in a parent module, it is optimized away, along with the combinational circuits exclusively feeding it.
For example, given the following Verilog code, draw the schematic,
To solve this kind of drawing schematic questions, we do it systematically
Find the registers: Based on the two rules listed above, here
DandAare register.Start building/drawing the circuit in order: We start from the first line, which is the
assignstatement to build the circuitLine 1 is an AND gate
Line 2 just connect the output of the register, which is A to B
Line C is just an XOR gate
Line D connects the C to a register and its output is D
E is just C
So, C is directly connected to the input to the register whose output is A
The final schematic looks like below,

Finite State Machines
A standard state machine architecture consists of three distinct components:
combinational next-state logic,
sequential state memory, and
combinational output logic .
For ASIC designs, a reset input is essential to initialize the state memory upon power-up; however, this is unnecessary in FPGA designs because dedicated Global Set/Reset (GSR) circuitry automatically initializes registers to their default values.

To implement the above FSM, we can use the verilog template as follows:
There are two types of FSMs:
Moore Machines
Mealy Machines
The distinction between these models lies in their output dependencies.
In a Mealy Machine, the output is a function of both the current state and the current input. Conversely,
in a Moore Machine, the output depends exclusively on the current state.
For example, the following are the next state and output example tables for the moore and mealy machines:

The two machines in the figure above are not functionally equivalent in terms of timing and transition behavior.
Moore Machine
To implement a moore machine, we can use the following two approaches.
Three-block approach
This approach clearly modularizes the design into three separate blocks:
a combinational block for next-state generation,
a second combinational block for output generation, and
a sequential block for the state memory.
This separation ensures that the output logic remains independent of the input signals, strictly adhering to the Moore definition.

Combined Combinational Logic
Alternatively, the next-state generation and output generation can be merged into a single combinational block. This method can potentially lead to hardware savings if the synthesis tool optimizes the shared combinational logic between the two functions.

Mealy Machine
For a Mealy machine, the output assignment must be placed inside the input conditional checks within the combinational block. Since the output is dependent on the input, any change in input signals will immediately propagate to the output asynchronously, unlike the registered behavior of the state memory.
Worth to Read
Last updated