Lec 02 - Digital System Design and Verilog

Digital System Design

Levels of Abstraction

Different from the abstraction we have seen in Harris & Harris, here, we will talk about the abstraction in digital system design. These five abstraction levels show us how different people (e.g., software engineers, digital design frontend engineer, etc) view the Digital Circuit.

Let's have a detailed look on each level,

Algorithm/System Level (Untimed)

At the highest level, the design is expressed as algorithms or functional behavior without worrying about timing.

For example, describing output = (A+B) + (C+D) in a flowchart or C-like pseudocode.

Focus: This part is only on the functionality, not on how many cycles or how it’s implemented in hardware.

Register Transfer Level (RTL, Timed)

It is the macroscopic hardware view and is described in terms of data transfers between registers and operations performed by functional units (ALUs, multiplexers, etc.) under clock control. In other words, data is

stored in registers and
computed in the combinational blocks

This macroscopic hardware view is implemented using the RTL code. Thus, it is timed, cycle-accurate, but still abstract (macroscopic).

For example, we have written the RTL Code in CG3207 Lab01, and the following simple Verilog code is also an example of RTL Code

always @(posedge clk)
    output <= (A+B) + (C+D);

Focus: This is where our macroscopic blocks appear — ALUs, adders, multiplexers, etc. And RTL code is the implementation of these macroscopic blocks.

Gate Level

It is the microscopic hardware view. The RTL code is synthesized into logic gates (AND, OR, NOT, flip-flops).

For example, (A+B) becomes a ripple-carry adder built out of AND/OR/XOR gates. It is boolean equations + gates, but no transistor-level details.

Focus: This is our microscopic implementation of RTL macros.

Circuit Level

It is the actual electronic implementation of logic gates using CMOS Transistors.

For example, an inverter (NOT gate) is realized using one PMOS and one NMOS transistor.

Focus: Device-level representation, electrical properties like delay, power, capacitance are considered.

Layout Level

It is the physical representation of the circuit on silicon. Masks for fabrication are designed here. And it is the final physical placement/routing of transistors and wires.

Simplified FPGA/ASIC Design Flow

The FPGA/ASIC design flow is highly complex. In an ideal world, a single click would transform a functional specification directly into an FPGA bitstream or a fabricated ASIC. While this vision is attractive, it is not yet practical. Instead, the design process is broken into smaller, manageable stages, as shown in the diagram below.

Today, the steps from functional specification to architectural synthesis are still largely performed manually. In contrast, the stages from logic synthesis to FPGA programming or ASIC fabrication are highly automated by modern EDA tools.

Notes

The words at the right side of the arrow is the output of its upper step. For example, after "Logic Synthesis", the output is a netlist.
In EE4218, the "Architectural Synthesis" is changed to "High-Level Synthesis (HLS)/Microarchitecture Design".
- Usually calls HLS when it is automated, microarchitecture design when it is manual.
This design flow follows a divide-and-conquer approach. However, this also means that the final system — assembled from many independently optimized components — is not guaranteed to be globally optimal.
Generally, EE4218 focuses on from Functoinal Specification to Architectural Synthesis, while EE4415 focuses on from Logic Synthesis to ASIC Fabrication.

Behavioral Modelling

Behavioral modeling defines the "what" of our FPGA/ASIC design — its high-level logic and algorithms — without hardware details. It’s for verifying functionality early.

Purpose: Ensures algorithmic correctness via simulations.
Tools:
- Initial tests with Python, C, Java, Matlab (fast, sequential). e.g., A Python script to simulate a filter: output = input * 0.5 if input > 0 else 0 to test logic.
- Deeper simulations with VHDL, Verilog, SystemC (handles concurrency and timing, but no cycle accuracy).
Not Directly Synthesized: Meant for validation, not hardware generation.
HLS Trend: High-Level Synthesis tools (e.g., Vivado HLS) can convert behavioral code to RTL, but effectiveness varies by tool, domain, and needs manual tweaks.

Then, the behavioral code/algo will be fed into Architectural Synthesis in the design flow, making complex problems manageable through iteration.

Architectural Synthesis

Architectural synthesis turns a high-level functional/behavioral (acrhitectural) model into a macroscopic structural (microarchitectural) model for FPGAs/ASICs. It’s mostly manual but becoming more automated.

In short, architectural synthesis is just writing RTL Code or get the schematic (macroscopic model). But to write good RTL Code, we first need to have a clear macroscopic model/diagram. (See RISC-V microarchitecture from Lec 03 as an example)

Purpose: Converts abstract logic into a cycle-accurate, synthesizable RTL code, typically with structural and behavioral elements.
Output: A block-level model where operations are timed and assigned to hardware blocks. Example: From Z = (A+B) * (C+D) * E, it creates a plan with adders and multipliers.
Key Steps:
- Scheduling (time, or when do we do the operation): Assigns operations to clock cycles (an integer). Finding all the clock cycles (integers) is called solving the scheduling problem. e.g., (A+B) in cycle #1, (C+D) in cycle #2.
- Binding (space, or where do we do the operation): Maps operations to specific hardware resources, like function units, memories or interconnects. e.g., (A+B) done by ALU #1, (C+D) by ALU #2.
- Flexibility: Adders can reuse for different pairs with multiplexers. e.g., ALU #1 adds A+B in cycle #1, then E+F in cycle #2 if inputs switch.
Tools: RTL synthesis infers register transfers and generates a netlist if guidelines are followed.

The following images shows the difference between behavioural modelling and architectural synthesis.

Binding directly determines how many actual FUs are instantiated in the final design, which in turn determines the area cost. Thus, architectural synthesis has the largest impact of the cost of building a chip.

For now, we should be able to use the hardware thinking to write RTL Code, let's recap on the necessary steps by looking through a very simple step!

Behavioral Modelling

Describes what the computation is, but not when or where it happens. This is what we have seen in Behavioral Modelling. For example, we want to calculate the following formula,

Z=(A+B)\times(C+D)\times E

Build a macroscopic model

Now, we start the Architectural Synthesis, where we first decide what kinds of resources will be in the datapath (e.g., 2 adders, 1 multiplier, 1 register file). Still no cycle-by-cycle schedule, just “these are the building blocks.” And this is the high-level datapath architecture.

Scheduling

This is done manually by deciding when each operation executes — e.g., in which clock cycle. For example, we want

Cycle 1: compute A+B
Cycle 2: compute C+D
Cycle 3: multiply results → (A+B)∗(C+D)
Cycle 4: multiply with E

Attention

Doing scheduling here is equivalent as solving a $NP$ hard problem.
Without scheduling, RTL can’t be written, because RTL requires explicit registers and clocked behavior.

Binding

This is still done manually. Once we know when things happen, we decide where they happen — e.g., which hardware resource executes each operation. For example,

Cycle 1: (A+B) → Adder #1
Cycle 2: (C+D) → Adder #1 again (reused)
Cycle 3: sum1 * sum2 → Multiplier #1
Cycle 4: prod * E → Multiplier #1 again

Attention

Similar to scheduling, binding is also equivalent as solving a $NP$ hard problem.
Binding decides the resource allocation vs reuse tradeoff (performance vs area).

RTL Coding

The final step is to write the RTL Code, which is also the last step of Architectural Synthesis. Once scheduling + binding are decided, the RTL code can be written, cycle-accurate. For example,

reg [31:0] sum1, sum2, prod, Z;
always @(posedge clk) begin
    sum1 <= A + B;        // cycle 1
    sum2 <= C + D;        // cycle 2 (same adder reused)
    prod <= sum1 * sum2;  // cycle 3
    Z    <= prod * E;     // cycle 4 (same multiplier reused)
end

These 5 steps are important and usually people will ignore the macroscopic block part, which is to think hardware. This will be dangerous. As macroscopic block will be useful when we check our sysnthesis report to see whether the hardware after sysnthesis is the same as what we want to build.

Logical Synthesis

Workflow

As shown in the diagram below, the logic synthesis will take in three things (HDL Code, Constraints, and Technology Library) and output one thing (mapped schematic)

Technology Library: The cells / microscopic building blocks we are allowed to use. Its purpose is to enable logic synthesis tools to map a design into the physical hardware efficiently while respecting the process technology constraints (timing, power, area).
- For ASICs, cells are usually gates or gate combinations (e.g., flip-flops, latches and buffers, etc). They are custom designed and characterized carefully by the foundry while respecting the physical limitations of the specific process technology.
- For FPGAs, technology library is composed of higher-level CLB functions (like adders, multipliers, LUTs, etc.) but still considered basic elements for synthesis.
Mapped Schematic
- Optimized schematic realizing the HDL code, using building blocks from the technology library.
- Usually a netlist that textually describes the interconnection between cells/building blocks.
Constraints
- Location: Logical port to physical pin mapping etc. (See from CS2100DE Lab 01)
- Timing specifications (optimization goals): Different schematics can be obtained from the same HDL code.

Substeps

Logic optimization: minimize undesirable redundancies (think karnaugh-maps), and hence, the cost and complexity of the design.

Technology mapping (library binding): map logic/hardware resources in macroscopic diagram to cells/building blocks in the library. This actually answers "Which actual silicon component from the library will implement this block?".

RTL and Registers

RTL (Register Transfer Level): Describes how data moves between registers on each clock cycle.

Registers are like “brick walls” — they hold state and define the boundaries of combinational logic.
Everything between registers is combinational logic (no memory, just logic operations).

RTL Synthesis/Logical synthesis tools: Convert our RTL into technology-mapped gates/cells (ASIC) or LUTs/CLBs (FPGA).

What gets optimized:
- Only the combinational logic between registers.
- Optimization can focus on:
  - Speed: minimize propagation delay
  - Area/cost: minimize number of gates or LUTs
Registers themselves are not “optimized” in terms of logic — they just store values.

Critical Path

Here, we add one point from Harris & Harris, that is,

Critical path = combinational path with maximum delay.

From Step 3 above, we see that between registers, there is only combinational logic. Hence, the critical path will determine the max clock in our circuit! The following image contains two examples,

If the clock is too fast and the longest "road" (critical path) hasn’t been traversed yet, the data won’t arrive in time, causing incorrect data at the next register. So, the maximum clock frequency is limited by the critical path delay.

"Engineering is about trade offs. So, most of the time the answer to a certain question is 'it depends'."

— Prof. Rajesh

So, the two design shown in the image above doesn't have one better and one worse, they both have their own use case.

Technology Library vs. Intellectual Property (IP)

In short, Technology Library is much more low-level. But IP is

A pre-designed, reusable functional block or module that performs a specific higher-level function. Can be soft (synthesizable RTL) or hard (physical implementation).

And IP is usually much higher-level than technology library — often entire subsystems, like a CPU core, USB controller, or memory controller. For example,

CPU cores (RISC-V, ARM Cortex-M)
PCIe or Ethernet controllers
DSP blocks for signal processing
Floating-point arithmetic units

Analogy:

Think of technology libraries as bricks and mortar.
IP blocks are like pre-built rooms or furniture made from those bricks. We can assemble our building (chip) faster if we have ready-made rooms rather than making everything brick by brick.

Physical Design

Placement

It is deciding where to put each cell (gate, flip-flop, or higher-level module) on the physical chip layout.

Goal: Place cells that communicate a lot closer together. As shorter interconnects will deduce lower delay and lower power.
Tradeoff: If we bring two blocks closer, some other blocks must move farther. This is a global optimization problem.

Placement minimizes some function of a coarse approximation of wirelengths.

Why not exact wire lengths?

Because exact wirelength calculation requires routing every connection, which is very computationally expensive.

Routing

It is mapping of logical connection between cells to physical interconnects, which is to physically connect all the cells with wires according to the netlist.

Goal: Make wires as short as possible. As this reduces delay, power, and congestion.
Constraint: We need placement first to know where to draw the wires.

For more on routing, please take NUS EE4218!

— From Prof. Rajesh

Why Placement and Routing is a chicken-egg problem?

This is because

Optimal placement depends on routing (because wire lengths affect timing).
Optimal routing depends on placement (because you need cell locations to route).

And our solution in practice is to use iterative approaches:

Place cells using approximate wirelengths.
Route the design.
Evaluate timing and congestion.
Adjust placement and repeat.

Some Notes

Always Think Hardware — have the topology of the circuit in mind (the macroscopic model), and write RTL code which will imply/infer the topology from the code.
Simulation and synthesis tools work very differently (From Harris & Harris DDCA)
1. Simulation "executes" HDL code following the semantics of the HDL
2. Synthesis "infers the hardware structure" described by HDL (RTL) code
Read this awesome note from sunburst!

FPGA

We have learned quite a lot about FPGA in Harris & Harris. Please go back and review the working principle of FPGA FYI. The purpose of this section is to get us know ASIC vs. FPGA, what are the use case for each of them. So, in short,

FPGA: flexible, fast to deploy, good for prototyping and low-volume/highly-custom tasks.
ASIC: high-performance, power/area-efficient, cost-effective only for large-scale production.

Verilog for Synthesis

Note that the verilog and the rules we are talking here are for writing the RTL code, not for the testbench/simulation.

HDL Coding Tips

For best results, use templates from the synthesis manual of the EDA tool you are using, but has the risk that the code may not work well with another tool.
1. For example, the synthesis manual for Vivado is here. (Very useful)

General Rules for Synthesizability

In this section, reg means the variable type is reg only. It doesn't mean that signal is a register. If a reg signal is inferred as a physical register, we will mention it explicitly.

Do NOT use delays (`#delay`)

Combinational (propagation) delays are hardware dependent; not something the synthesis tool can insert based on HDL code. So, the following code and the use of delay mentioned here are not recommended,

always @(posedge clk) begin
  data_out <= #10 data_in;   // #10 only works in simulation
end

In RTL Verilog code, we insert clock cycle delays explicity by introducing a physical register (e.g., it is a part of our design). So, the following is the correct code for the above,

reg [7:0] data_d1, data_d2;

always @(posedge clk) begin
  data_d1 <= data_in;   // 1 clock cycle delay
  data_d2 <= data_d1;   // 2 clock cycle delay
end

assign data_out = data_d2;

Using one such always block as above gives us a way to insert any number of registers between two combinational blocks.

So, the key takeaway is

#delay is for testbenches only.
In real FPGA/ASIC design, delays are achieved via registers (clock cycles) and timing constraints (combinational delay).

Use one clock for the entire design

Connect the input of every sequential element to this only one clock
1. Run different things on different speed, instead of using different clocks. (One example is the Clock_Enable module from CG3207 Lab01)
The clock should not come from a combinational circuit, as a combinational circuit can have glitches in its output.

In the above image, the glitch happens because NOT gate has a propagation delay.

Another example is that "do not use something like @(posedge button) for detecting a transition".

Use a synchronous edge detection scheme instead — e.g., by comparing the current value with the previous value stored in a register.

assign button_edge = button & ~button_prev;

always @(posedge CLK)
begin
    button_prev <= button;
end

Do not have combinational feedback path

Every circular assignment should be broken by a register (an assignment in a synchronous always block). This is what we have seen in Harris and Harris.

`reg`s and `wire`s should not have multiple drivers

A wire should appear on the LHS of only one assign statement.
A reg should appear on the LHS of only one always block. -> Cannot appear in more than one always block.

For example, the following is wrong

assign Z = A & B;
assign Z = X | Y;

But its' ok to have a reg at the LHS of multiple statements within the same always block as long as the same type of assignment is used. e.g., the use of if statements.

Only blocking (=) or only non-blocking (<=), do not mix the two for a particular reg.
Within an always block, if a signal is assigned more than once, whichever assignment executes last in the flow of control is what the reg ends up holding.

Initializations of reg are ignored by synthesis

If we use either initial block or reg declaration (e.g., reg = 1'b0) to initialize a register, this initialization will be useful for registers and memories for FPGAs in simulation, but are ignored by the synthesis tools.

Wires cannot be initialized

wires cannot be meaningfully initialized as they don't store anything. (Go back review the working principle of wire again if we forget)

Initialization to 0 or 1 will connect the wire to a constant 0 or 1 respectively. Further assignment using assign will cause it to have multiple drivers. This is dangerous and not recommended!

Not all regs can be initialized

regs that are synthesized as combinational circuits cannot be meaningfully initialized.

reg z = 1'b1;   // initialization ignored in synthesis

always @(*) begin
  z = a & b;    // purely combinational assignment
end

Here, z is not a flip-flop — it’s just a combinational output of a & b. The = 1'b1 initialization only affects simulation; in real hardware it’s ignored. So on power-up, z won’t magically start at 1. It will simply depend on a & b.

Not all regs will be inferred as physical registers.

Do not assume registers and memory have 0s as initial value

In ASIC, below is a simplified circuitry to initialize register content to 0.

In FPGA, the similar functionality is built and asserted when programming a bitstream. However, if we don't write the explicit reset code, FPGA will either initialize it to 0 or 1 and it cannot be decided.

This problem will appear when we want to design a kind of memory and more can be found in my Mach-V project's branch history table and branch target buffer part as both of them need me to code a RAM.

Every always should follow one of the three templates given below strictly

This part is very similar to the content in Harris & Harris. But with more tips added.

Purely combinational (always_comb in SystemVerilog)

To write RTL for more complex combinational circuit, use

use always @(*)
blocking statements (=)

These are the same as Harris & Harris, besides that, we also recommend that

Every reg must be assigned a meaningful value (not something like Z <= Z;) for every possible combination of inputs (e.g., all branches of if/case statements). Otherwise, that reg will become physical register and it is no longer a combinational circuit anymore (This is very dangerous!). For example, the following code is correct
```
always @(*)
begin
    if (x)
        Z = 1'b0;
    else
        Z = Y;
end
```

Two cases with blocking and non-blocking statements in always @(*)
1. Blocking executes immediately, in order. So if a reg is used on both left-hand side (LHS) and right-hand side (RHS), we should assign it first before we read it. (that reg should appear on LHS before RHS)
2. Non-blocking executes in parallel at the end of the clock edge/at the end of the always block, so within the same block we’ll never see the updated value in the RHS. For example, in the following code snippet, Z in Line 5 will hold the old value of Z.
  always @ (*) begin notX = ~X; Z <= notX & Y; A <= Z; end
Use blocking assignment (=) for internal calculations (e.g., variables used only internally in the always block) and non-blocking assignment (<=) for the outputs of the always block (e.g., signals used as inputs to RHS in other blocks and not within the same block) works.
```
always @(*)
begin
    notX = ~X;
    Z <= notX & Y;
end
```
The code above will be synthesized into the

Rule 3 will make your code more VHDL-like but lint tools may complain.

In short, rule 1 and rule 2(a) are important! For the rule 2(b) and rule 3, just never use non-blocking statement in always @(*).

Sychronous (always_ff)

Synchronous means that the output changes only on the rising or falling edge of a single clock. And in Harris & Harris, we have seen two rules of thumb:

use always @(posedge clk)
non-blocking assignments (<=) only

By keeping the above two rules, we should be able to avoid 99% of problems. But the remaining 1%, will probably be in the paper exam. 😂

Use non-blocking assignments (<=) for the outputs of the always block (signals), as well as for any internal physical registers. But the updated values are not available for use at the same clock edge. (See Step 1, rule 2(b) example)
If we insist on using blocking assignments (=) for internal combinational parts (variables). In this case, the variable should appear on the LHS before RHS.
```
always @ (posedge CLK)
begin
    tmp = X | Y; // tmp is a combinational variable
    Z <= tmp; //Z is an output (signal)
    // Z <= X | Y; is fine as well
end
```
1. However, it is recommended to move combinational parts into a separate always block or assign statement so that the original block just becomes a register.

Synchronous with asynchronous set/reset

This usually involves a resettable register which we've seen a lot in Harris & Harris. To design such resettable register, we should keep the following rules in mind

Inside the if statement (to reset the value in the register), output should be assigned a vector of 0's and 1's, and nothing more should be done.
All other code (e.g., the synchronous portion) should be inside the else (begin and end of else). There should not be additional outer if/else ifs, but can have inner if/else ifs, but these ifs need not have else, why?
1. It’s because it’s synchronous. In clocked logic, missing else = “hold value,” which is fine. But in combinational logic, missing else = “need to remember,” which infers unintended latches. (This is super important!)

In short, Prof. Rajesh recommends to never use resettable registers in this course.

When are physical regs inferred

In Verilog, reg doesn't mean it is a physical registers. The following rules summarize when are physiscal registers actually inferred

non-blocking assignments of regs in a synchronous always block
In an always @(posedge clk),
1. if we use blocking assignment (=) to read a reg (RHS) before writing it (LHS), that reg is inferred as a physical register.
2. If the output is assigned to a reg using blocking assigment (=) and this output is used at the RHS in some other always block (can be @(*) or @(posedge CLK))
  always @(posedge CLK) begin tmp = X | Y; Z = tmp; end always @(posedge CLK) begin A <= Z; end

Inferring using the second way is not recommended!

Basically, the registers will be simplified if they fall into the following two cases

If there are two registers storing identical content, they could be merged by the synthesis tool as a part of optimization (configurable)
If a register content is not used in a parent module, it is optimized away, along with the combinational circuits exclusively feeding it.

For example, given the following Verilog code, draw the schematic,

assign Z = X & Y;
    always @ (posedge CLK)
    begin
        B = A;
        C = B ^ Z;
        D <= C;
        E = C;
        A = E;
    end

To solve this kind of drawing schematic questions, we do it systematically

Find the registers: Based on the two rules listed above, here D and A are register.
Start building/drawing the circuit in order: We start from the first line, which is the assign statement to build the circuit
1. Line 1 is an AND gate
2. Line 2 just connect the output of the register, which is A to B
3. Line C is just an XOR gate
4. Line D connects the C to a register and its output is D
5. E is just C
6. So, C is directly connected to the input to the register whose output is A

The final schematic looks like below,

Finite State Machines

FSMs are introduced in DDCA in pretty much detail, this part is from Prof's slides

A standard state machine architecture consists of three distinct components:

combinational next-state logic,
sequential state memory, and
combinational output logic .

For ASIC designs, a reset input is essential to initialize the state memory upon power-up; however, this is unnecessary in FPGA designs because dedicated Global Set/Reset (GSR) circuitry automatically initializes registers to their default values.

To implement the above FSM, we can use the verilog template as follows:

module SM(
    input clk,
    input Inp,
    output reg Outp // no reg if assigned using assign
);
    // Only 1 bit required for state in this specific example
    reg [0:0] state, n_state;
    parameter S1 = 1'b0, S2 = 1'b1;
    // Implementation follows...
endmodule

There are two types of FSMs:

Moore Machines
Mealy Machines

The distinction between these models lies in their output dependencies.

In a Mealy Machine, the output is a function of both the current state and the current input. Conversely,
in a Moore Machine, the output depends exclusively on the current state.

For example, the following are the next state and output example tables for the moore and mealy machines:

The two machines in the figure above are not functionally equivalent in terms of timing and transition behavior.

Moore Machine

To implement a moore machine, we can use the following two approaches.

Three-block approach

This approach clearly modularizes the design into three separate blocks:

a combinational block for next-state generation,
a second combinational block for output generation, and
a sequential block for the state memory.

This separation ensures that the output logic remains independent of the input signals, strictly adhering to the Moore definition.

// 1. Next State Logic
always @ (*) begin
    case(state)
        S1: if(Inp == 1'b0) n_state = S1;
            else            n_state = S2;
        S2: if(Inp == 1'b0) n_state = S2;
            else            n_state = S1;
    endcase
end

// 2. Output Logic (Independent of Input)
always @ (*) begin
    case(state)
        S1: Outp = 1'b1;
        S2: Outp = 1'b0;
    endcase
end

// 3. State Memory
always @ (posedge clk) begin
    state <= n_state;
end

Combined Combinational Logic

Alternatively, the next-state generation and output generation can be merged into a single combinational block. This method can potentially lead to hardware savings if the synthesis tool optimizes the shared combinational logic between the two functions.

always @ (*) begin
    case(state)
        S1: begin
            Outp = 1'b1; // Output defined by state
            if(Inp == 1'b0) n_state = S1;
            else            n_state = S2;
        end
        S2: begin
            Outp = 1'b0;
            if(Inp == 1'b0) n_state = S2;
            else            n_state = S1;
        end
    endcase
end

Mealy Machine

For a Mealy machine, the output assignment must be placed inside the input conditional checks within the combinational block. Since the output is dependent on the input, any change in input signals will immediately propagate to the output asynchronously, unlike the registered behavior of the state memory.

// --- Clock Block (Sequential Logic) ---
always @(posedge clk) begin
    state <= n_state;
end

// --- Combinational Block (Next State & Output Logic) ---
always @(*) begin
    case(state)
        // Outp *is* dependent on Inp (Mealy Machine)
        S1: if(Inp == 1'b0) begin
            n_state = S1;
            Outp = 1'b1;
        end
        else begin
            n_state = S2;
            Outp = 1'b0;
        end

        S2: if(Inp == 1'b0) begin
            n_state = S2;
            Outp = 1'b0;
        end
        else begin
            n_state = S1;
            Outp = 1'b0;
        end
        
        default: begin
            n_state = S1;
            Outp = 1'b0;
        end
    endcase
end

Worth to Read

Verilog LifeSaver

PreviousLec 01 - History, Technology, Performance NextLec 03 - RISC-V ISA and Microarchitecture

Last updated 16 days ago

hashtagDigital System Design

hashtagLevels of Abstraction

hashtagAlgorithm/System Level (Untimed)

hashtagRegister Transfer Level (RTL, Timed)

hashtagGate Level

hashtagCircuit Level

hashtagLayout Level

hashtagSimplified FPGA/ASIC Design Flow

hashtagNotes

hashtagBehavioral Modelling

hashtagArchitectural Synthesis

hashtagBehavioral Modelling

hashtagBuild a macroscopic model

hashtagScheduling

hashtagAttention

hashtagBinding

hashtagAttention

hashtagRTL Coding

hashtagLogical Synthesis

hashtagWorkflow

hashtagSubsteps

hashtagRTL and Registers

hashtagCritical Patharrow-up-right

hashtagPhysical Design

hashtagSome Notes

hashtagFPGA

hashtagVerilog for Synthesis

hashtagHDL Coding Tips

hashtagGeneral Rules for Synthesizability

hashtagDo NOT use delays (#delay)

hashtagUse one clock for the entire design

hashtagDo not have combinational feedback path

hashtagregs and wires should not have multiple drivers

hashtagInitializations of reg are ignored by synthesis

hashtagWires cannot be initialized

hashtagNot all regs can be initialized

hashtagDo not assume registers and memory have 0s as initial value

hashtagEvery always should follow one of the three templates given below strictly

hashtagWhen are physical regs inferred

hashtagFinite State Machines

hashtagMoore Machine

hashtagThree-block approach

hashtagCombined Combinational Logic

hashtagMealy Machine

hashtagWorth to Read

Digital System Design

Levels of Abstraction

Algorithm/System Level (Untimed)

Register Transfer Level (RTL, Timed)

Gate Level

Circuit Level

Layout Level

Simplified FPGA/ASIC Design Flow

Notes

Behavioral Modelling

Architectural Synthesis

Behavioral Modelling

Build a macroscopic model

Scheduling

Attention

Binding

Attention

RTL Coding

Logical Synthesis

Workflow

Substeps

RTL and Registers

Critical Path

Physical Design

Some Notes

FPGA

Verilog for Synthesis

HDL Coding Tips

General Rules for Synthesizability

Do NOT use delays (`#delay`)

Use one clock for the entire design

Do not have combinational feedback path

`reg`s and `wire`s should not have multiple drivers

Initializations of reg are ignored by synthesis

Wires cannot be initialized

Not all regs can be initialized

Do not assume registers and memory have 0s as initial value

Every always should follow one of the three templates given below strictly

When are physical regs inferred

Finite State Machines

Moore Machine

Three-block approach

Combined Combinational Logic

Mealy Machine

Worth to Read