Accumulator-Based CPU Design

Srimanth Tenneti
8 min readNov 27, 2022

--

Introduction

This article describes a simple accumulator-based Von Neumann CPU design using Verilog HDL. This would provide a simple framework for the readers to create their own CPUs with more robust, well-defined, and efficient instructions. The instructions that would be modeled would be register to register type only with no branch and no jump type.

This is a simple weekend project that takes about 3–4 hours to build and is a fun way to get started with CPU design.

Accumulator Based CPUs

In the early computing era, most computers were accumulator based where the processing entities would have a lot of registers to store the data but one common special register called the accumulator would be used to store the output and also act as one of the inputs.

Some notable computers that used accumulators are the ENIAC, IBM 701, HP2100, Intel 4004, Intel 8008, Intel 8085, etc.

Intel 4004 — Courtesy Wikipedia
IBM 701 — Courtesy IBM
ENIAC — Courtesy Wikipedia

The typical accumulator-based CPU architecture would look something like the figure below.

Typical Accumulator based CPU architecture

The ALU (Arithmetic and Logic unit) is the heart of all computation. This would be used to perform the main operations of the CPU. The ACC (Accumulator) is the common operand for all the operations of the CPU. the IR (Instruction Register) would store the instructions from memory and would set the data for the decode and execute states. Once the decode is complete the ALU would have the memory location for the data and the opcode for the operation which would be used to complete the execution of the instruction. Once the execution is complete the PC (Program Counter) would be incremented to point to the next location in the memory.

Design Approach

To design any CPU the first vital step would be to decide an ISA for the CPU. To start I would first like to model the instruction format for the CPU.

Instruction Format for the X1CPU

For the scope of this simple design, we would cover a few register-register instructions like addition, subtraction, multiplication, logical right shift, arithmetic right shift, logical left shift, null and load, etc.

Instruction set for the X1CPU

System Design

To implement the design we start with 3major components which are the declarations, the instruction decoder, and the ALU. The remaining registers are going to be a part of either one of the above three phases.

1. Module and Variable Declaration

/*
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Design : Accumulator Based CPU - X1
Engineer : Srimanth Tenneti
Date: 27th November 2022
Description:
1. ADD, SUB, MUL, SRL, SRA, SLL, NUL, LD - instructions supported
2. No branch | jump
3. Register - Register mode only
4. Internal Memory
Version: 0.01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
*/

module CPUACC #(parameter W = 4)(
// Global Signals
input cpuClk,
input cpuRst,
input wm,
// Output
output [W-1 : 0] cpuOut
);

// CPU Register instantiation
// 32 locations deep 16 bits wide
reg [15:0] mem [0:31];
reg [3:0] ACC; // Accumulator register
reg [31:0] IR; // Instruction Register
reg [3:0] A; // Value 1

reg OF; // To handle Overflow (Simple Flag)
reg [4:0] PC; // Program Counter


// Instruction Deocode Stage
reg [2:0] opcode; // Opcode
reg [4:0] SA0; // Source Address
reg [4:0] DA0; // Destination Address
reg [2:0] Sa; // Shift amount

// Decode Signals
reg add, sub, mul, srl, sra, sll, nul, ld;

In the above part, we declared all the variables needed for the decode and execute phase. Apart from that to store the instructions we instantiated a memory that is 16 bits wide and 32 locations deep. The reset in this implementation would be active low and asynchronous. As there would be 32 locations we would need a 5-bit Program Counter which is also instantiated.

2. Instruction Decode

 
always @ (*)
begin
if (~cpuRst)
begin
opcode = 3'b111; // Defaults to load A
// Clears the remaining values
SA0 = 0;
DA0 = 0;
Sa = 0;
PC = 0;
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
A = 0;
ACC = 0;
OF = 0;
end
else
begin

// Decode
IR = mem[PC];
opcode = IR[5:3];
Sa = IR[2:0];
SA0 = IR[15:11];
DA0 = IR[10:6];

// Fetch
A = mem[SA0];

case (opcode)
3'b000 : begin
add = 1;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;

end
3'b001 : begin
add = 0;
sub = 1;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;

end
3'b010 : begin
add = 0;
sub = 0;
mul = 1;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;

end
3'b011 : begin
add = 0;
sub = 0;
mul = 0;
srl = 1;
sra = 0;
sll = 0;
nul = 0;
ld = 0;

end
3'b100 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 1;
sll = 0;
nul = 0;
ld = 0;

end
3'b101 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 1;
nul = 0;
ld = 0;

end
3'b110 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 1;
ld = 0;

end
3'b111 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 1;

end
endcase
end
end

The above code snippet depicts the instruction decode logic of the CPU. Here based on the opcode specific signals are triggered that help the ALU decide the operation for the cycle. The value 1 indicates the specific operation. The IR is used to store the data from the memory which is decoded into SA0, DA0, Opcode, and Sa by the decoder and this data is used for the execution of the requested operation.

3. Execute Phase

// Output Logic - Execute

always @ (posedge cpuClk or negedge cpuRst)
begin
if (~cpuRst)
begin
ACC <= 0;
IR <= 0;
end
else
begin
PC = PC + 1;
case({add, sub, mul, srl, sra, sll, nul, ld})
8'b1000_0000 : {OF, ACC} <= ACC + A[3:0];
8'b0100_0000 : {OF, ACC} <= ACC - A[3:0];
8'b0010_0000 : ACC <= ACC * A[3:0];
8'b0001_0000 : ACC <= A[3:0] >> Sa;
8'b0000_1000 : ACC <= A[3:0] >>> Sa;
8'b0000_0100 : ACC <= A[3:0] << Sa;
8'b0000_0010 : ACC <= 0;
8'b0000_0001 : ACC <= A[3:0];
default : ACC <= 0;
endcase
end
end

// Drive

assign cpuOut = ACC; // CPU output logic

In this phase, we look at the signals generated by the instruction decoder to execute a specific operation on a specific set of data. The above logic is implemented using a One-Hot encoding mechanism to facilitate a simple design. Finally, the output is driven using the accumulator value.

Verification

To verify the design we need to have 2 major entities which include a test function and a test memory. As the memory is internal to the CPU we would use a bin file to load the data into the memory and would execute the instructions.

module CPU_TB (); 

// Test ports

reg cpuClk;
reg cpuRst;
reg wm;

// Variable for itration
integer i;

wire [3:0] cpuOut;

The above code snippet sets up the basic variables required for the CPU’s verification.

// Clocking and System Initialization 

initial
begin
cpuClk = 0;
cpuRst = 0;
wm = 0;
forever #4 cpuClk = ~cpuClk;
end

The above block implements the initialization and clocking logic for the test bench.


// Test instance

CPUACC cpu0 (
.cpuClk(cpuClk),
.cpuRst(cpuRst),
.wm(wm),
.cpuOut(cpuOut)
);

The above code snippet instantiates the implemented CPU for testing. The instance name cpu0 is a vital component to this verification as this would be used to load the memory instance in the CPU.

// Memory write task

task memwrite (input [15:0] data, input [4:0] addr);
begin
cpu0.mem[addr] = data;
end
endtask

// Memory Read Task

task memread (input [4:0] raddr);
begin
$display("The Addr : %d has Value : %b", raddr, cpu0.mem[raddr]);
end
endtask

To simplify the verification effort I decided to write two simple tasks that would help us read and write to the memory instance.

// Write all zero to locations of the Cpu memory 

initial
begin
for (i = 0 ; i < 32 ; i = i + 1)
begin
memwrite(0, i);
end
#8;
for (i = 0 ; i < 32 ; i = i + 1)
begin
memread(i);
end
// $finish;
end

The above code snippet writes zeros to all the locations in the memory. This acts as a clear sequence and helps us properly initialize the CPU memory.

Now we need to define the instruction memory for the CPU. To do this we write a simple .bin file that would hold the instructions for the CPU.

0000100011111000
1111111111111111
0010000000110000
0000100000011011
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
1000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000

This .bin instance tests 3 different operations that include NUL, SRA & ADD. Now this will be loaded into the CPU for execution. Using the memory read and write tasks we verify the contents of the memory.

// Read Instructions from a binary file 

initial
begin
$readmemb("instr_mem.bin", cpu0.mem);
end

// Bin file read test loop

initial
begin
for (i = 0 ; i < 32 ; i = i + 1)
begin
memread(i);
end
$display ("Execution Loop ... \n");
// $finish;
end
cpu0 instance memory contents

This proves that we successfully loaded the .bin file into the CPU memory. Now we disable the reset and allow the CPU to execute the instructions in the memory.

 initial 
begin
$dumpfile("cpu.vcd");
$dumpvars();
#20;
cpuRst = 1;
#100;
$monitor ("CPU output : %b", cpu0.cpuOut);

$finish;
end
endmodule

Simulation Result

Final Simulation of the CPU Design

--

--