← BACK / INDEX
CUSTOM SPEC UT Dallas · CMPE Verilog · iVerilog Two programs · both PASSED

A custom 24-bit multi-cycle CPU,
built ground-up in Verilog.

A full Harvard-architecture processor designed under a unique set of embedded constraints — a 29-bit instruction width, a 24-bit data bus, an 8-entry register file, reverse-endian array allocations, and a 3-state multi-cycle FSM. Closes its own ISA, datapath, and memory subsystem, and successfully executes two custom assembly programs end-to-end on iVerilog.

Architecture
Harvard · multi-cycle
Instruction width
29 bits
Data bus
24 bits
Address space
1024 × 10-bit
01

System overview

§ Custom spec
// What it is

A ground-up CPU and ISA designed against a strict, unique constraint sheet. Memory is strictly Harvard — instructions and data live in completely separate address spaces. The program loads at 0x200, the data bus is 24 bits wide, and every array allocation in the source uses the reverse-endian [0:N] form while the address math underneath still has to behave as little-endian.

The pipeline is multi-cycle: a 3-state FSM (FETCH → EXECUTE → HALT) walks each instruction through the register file, ALU, and either of the two memories. There's no branch prediction and no forwarding — just a clean, observable datapath with a Program Counter, a register file, and two BRAM-style memory blocks.

// MOD-01 · ALU

24-bit combinational ALU

Add, subtract, and Set-Less-Than over the full 24-bit data bus. Exposes Zero / Carry / Negative status flags so future versions can fold them into branch conditions.

// MOD-02 · REGFILE

8-register synchronous file

Eight 24-bit registers addressed by 3 bits. R0 is hard-wired to zero for cheap clears and zero-comparison. Writes commit on the positive clock edge only when write_enable is high.

// MOD-03 · IMEM

Instruction ROM

1024-deep, 29-bit-wide ROM. Pre-loaded with two assembly programs at 0x200 (memory ops) and 0x300 (looping). Combinational read addressed by the PC.

// MOD-04 · DMEM

Data RAM

1024-deep, 24-bit-wide RAM with synchronous write and combinational read. Pre-seeded with A=20 at 0x10 and B=22 at 0x20 so the test programs have something to chew on.

// MOD-05 · DECODER

29-bit instruction decoder

Slices the instruction into opcode[0:3], rd[4:6], rs1[7:9], rs2[10:12], and the 16-bit immediate imm[13:28]. Pure combinational.

// MOD-06 · CONTROL

Multi-cycle FSM

3-state controller — FETCH, EXECUTE, HALT — driving the PC, write-back muxing, and memory enables. Async reset drops the PC at the configured entry point.

02

Instruction Set Architecture

§ Custom ISA
// 29-bit layout

The instruction format trades a wide immediate for a slim register-field budget: 4 bits of opcode, three 3-bit register specifiers, and a generous 16-bit immediate that's wide enough to address any cell in the data memory directly.

[0:3] · 4b
opcode
operation
[4:6] · 3b
rd
dest
[7:9] · 3b
rs1
src 1
[10:12] · 3b
rs2
src 2
[13:28] · 16b
immediate
constant / address / offset
// Total · 29 bits · MSB-first reverse-endian allocation

Opcode set

BinaryMnemonicBehavior
0000ALU_ADDRd ← Rs1 + Rs2
0001ALU_SUBRd ← Rs1 − Rs2
0101ALU_SLTRd ← (Rs1 < Rs2) ? 1 : 0
1001OP_LIRd ← {8'b0, imm}
1010OP_BEQZif (Rs1 == 0) PC ← PC + imm
1011OP_JUMPPC ← PC − imm (relative back-jump)
1100OP_LWRd ← DMEM[imm]
1101OP_SWDMEM[imm] ← Rd
1111OP_HALTStop execution (FSM → HALT)
03

Multi-cycle datapath

§ FSM
// 3-state controller

The control unit walks every instruction through a tight three-state loop. FETCH latches the instruction off the ROM bus and arms the decoder; EXECUTE routes data through the ALU, register file, or data memory and advances the PC; HALT traps the machine in place when the program ends.

STATE 00
FETCH
PC → IMEM · latch instruction
STATE 01
EXECUTE
decode · ALU / RF / DMEM · PC+1
STATE 02
HALT
stable trap on OP_HALT
04

Engineering challenges

§ Hazards · solved
// Two race conditions

Multi-cycle CPUs look clean on paper, but the moment write-enables, addresses, and the PC all live on the same clock edge, the datapath turns into a race. Two hazards in particular cost real time on the bench before they were pinned down.

⚠ HAZARD-01 · REGISTER PIPELINE RACE

The destination register changed before the clock could latch the result.

// PROBLEM

During EXECUTE, the CPU signaled the register file to commit an ALU result. The write needed a clock edge to lock in, but on that same edge the FSM also incremented the Program Counter. Because write_addr was wired straight to the decoder, the destination field flipped to the next instruction's rd before the latch fired — silently writing the result into the wrong register and poisoning every downstream loop iteration.

// FIX

Added a dedicated 3-bit pipeline registerwrite_addr_reg — that latches the destination address inside the EXECUTE cycle. The register file now sees a stable, pre-clocked target regardless of how fast the PC advances behind it.

⚠ HAZARD-02 · MEMORY STORE TIMING

The data RAM was writing one cycle late — after HALT had wiped the address.

// PROBLEM

The Store Word path used a registered write_enable flag. By the time it propagated, the PC had already jumped to OP_HALT, which forced the address bus to zero. The RAM dutifully wrote — but to the wrong cell — destroying the final result the program had just computed.

// FIX

Re-engineered the DMEM write-enable from a registered signal into a real-time combinational wire: wire dmem_we = (state == EXECUTE && opcode == OP_SW);. The RAM now commits in the same cycle the EXECUTE state is asserted, before the FSM has any chance to advance the PC into the next instruction.

cpu.v · dmem write-enableverilog
// Combinational write-enable — locks DMEM on the EXECUTE edge,
// before the PC has a chance to advance into the next instruction.
wire dmem_we = (state == EXECUTE && opcode == OP_SW);

data_memory dmem (
    .clk        (clk),
    .write_enable (dmem_we),
    .addr       (imm[6:15]),
    .write_data (read_data1),
    .read_data  (dmem_out)
);
05

Verification

§ iVerilog · two programs
// Both PASSED

The CPU is signed off by two custom assembly programs running on iVerilog. The first exercises every data-memory path; the second exercises every control-flow path. Both finish with a clean PASS in the testbench monitor.

PROG-01 · MEMORY OPS

C = A + B

// LW · ADD · SW · HALT — exercises both memories
42 Mem[0x30]
FINAL VALUE

Loads A=20 from 0x10, loads B=22 from 0x20, sums them through the ALU, stores the result to 0x30, halts. Validates the Store-Word race fix end-to-end — the result has to survive the HALT transition.

→ MINIMAL PROGRAM · PASSED
PROG-02 · CONTROL FLOW

sum += i, for i = 0 → 10

// LI · SLT · BEQZ · ADD · JUMP · HALT — full loop
55 R1 (Sum)
FINAL VALUE

A Gauss-sum loop: SLT against an upper bound of 11, BEQZ to exit, ADD to accumulate, JUMP back to the top. Validates the register pipeline fix — every loop iteration writes to the correct register, even as the PC advances on the same edge.

→ LOOPING PROGRAM · PASSED
Functional pass
2 / 2
iVerilog · testbench monitor
Hazards solved
2
Regfile race · DMEM write race
Modules
6
ALU · RF · IMEM · DMEM · Decoder · Control
ISA depth
9
opcodes · ALU · branch · jump · LW · SW
06

Toolchain

§ Stack
07

Takeaways

§ Reflection
// What stuck

The most useful lesson out of this project was just how aggressively a multi-cycle CPU punishes you for assuming a signal is “synchronous enough.” Both hazards looked like they should have worked — the write_enables were asserted, the addresses were correct on paper, the simulation transcript looked clean until I read it carefully. The fix in each case was to pull the timing one layer closer to the present — either by latching the address into a dedicated pipeline register, or by promoting a registered enable into a real-time combinational wire.

The other thing that stuck: a tiny, carefully chosen ISA is genuinely fun to design. 9 opcodes, one immediate field, one data bus — and you can still write a complete for loop with control flow that runs on real silicon-style logic. The constraint sheet felt restrictive on day one and ended up being the most interesting part of the project.

More work →