Accumulator-Based CPU Design
Introduction
This article describes a simple accumulator-based Von Neumann CPU design using Verilog HDL. This would provide a simple framework for the readers to create their own CPUs with more robust, well-defined, and efficient instructions. The instructions that would be modeled would be register to register type only with no branch and no jump type.
This is a simple weekend project that takes about 3–4 hours to build and is a fun way to get started with CPU design.
Accumulator Based CPUs
In the early computing era, most computers were accumulator based where the processing entities would have a lot of registers to store the data but one common special register called the accumulator would be used to store the output and also act as one of the inputs.
Some notable computers that used accumulators are the ENIAC, IBM 701, HP2100, Intel 4004, Intel 8008, Intel 8085, etc.
The typical accumulator-based CPU architecture would look something like the figure below.
The ALU (Arithmetic and Logic unit) is the heart of all computation. This would be used to perform the main operations of the CPU. The ACC (Accumulator) is the common operand for all the operations of the CPU. the IR (Instruction Register) would store the instructions from memory and would set the data for the decode and execute states. Once the decode is complete the ALU would have the memory location for the data and the opcode for the operation which would be used to complete the execution of the instruction. Once the execution is complete the PC (Program Counter) would be incremented to point to the next location in the memory.
Design Approach
To design any CPU the first vital step would be to decide an ISA for the CPU. To start I would first like to model the instruction format for the CPU.
For the scope of this simple design, we would cover a few register-register instructions like addition, subtraction, multiplication, logical right shift, arithmetic right shift, logical left shift, null and load, etc.
System Design
To implement the design we start with 3major components which are the declarations, the instruction decoder, and the ALU. The remaining registers are going to be a part of either one of the above three phases.
1. Module and Variable Declaration
/*
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Design : Accumulator Based CPU - X1
Engineer : Srimanth Tenneti
Date: 27th November 2022
Description:
1. ADD, SUB, MUL, SRL, SRA, SLL, NUL, LD - instructions supported
2. No branch | jump
3. Register - Register mode only
4. Internal Memory
Version: 0.01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
*/
module CPUACC #(parameter W = 4)(
// Global Signals
input cpuClk,
input cpuRst,
input wm,
// Output
output [W-1 : 0] cpuOut
);
// CPU Register instantiation
// 32 locations deep 16 bits wide
reg [15:0] mem [0:31];
reg [3:0] ACC; // Accumulator register
reg [31:0] IR; // Instruction Register
reg [3:0] A; // Value 1
reg OF; // To handle Overflow (Simple Flag)
reg [4:0] PC; // Program Counter
// Instruction Deocode Stage
reg [2:0] opcode; // Opcode
reg [4:0] SA0; // Source Address
reg [4:0] DA0; // Destination Address
reg [2:0] Sa; // Shift amount
// Decode Signals
reg add, sub, mul, srl, sra, sll, nul, ld;
In the above part, we declared all the variables needed for the decode and execute phase. Apart from that to store the instructions we instantiated a memory that is 16 bits wide and 32 locations deep. The reset in this implementation would be active low and asynchronous. As there would be 32 locations we would need a 5-bit Program Counter which is also instantiated.
2. Instruction Decode
always @ (*)
begin
if (~cpuRst)
begin
opcode = 3'b111; // Defaults to load A
// Clears the remaining values
SA0 = 0;
DA0 = 0;
Sa = 0;
PC = 0;
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
A = 0;
ACC = 0;
OF = 0;
end
else
begin
// Decode
IR = mem[PC];
opcode = IR[5:3];
Sa = IR[2:0];
SA0 = IR[15:11];
DA0 = IR[10:6];
// Fetch
A = mem[SA0];
case (opcode)
3'b000 : begin
add = 1;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
end
3'b001 : begin
add = 0;
sub = 1;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
end
3'b010 : begin
add = 0;
sub = 0;
mul = 1;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
end
3'b011 : begin
add = 0;
sub = 0;
mul = 0;
srl = 1;
sra = 0;
sll = 0;
nul = 0;
ld = 0;
end
3'b100 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 1;
sll = 0;
nul = 0;
ld = 0;
end
3'b101 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 1;
nul = 0;
ld = 0;
end
3'b110 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 1;
ld = 0;
end
3'b111 : begin
add = 0;
sub = 0;
mul = 0;
srl = 0;
sra = 0;
sll = 0;
nul = 0;
ld = 1;
end
endcase
end
end
The above code snippet depicts the instruction decode logic of the CPU. Here based on the opcode specific signals are triggered that help the ALU decide the operation for the cycle. The value 1 indicates the specific operation. The IR is used to store the data from the memory which is decoded into SA0, DA0, Opcode, and Sa by the decoder and this data is used for the execution of the requested operation.
3. Execute Phase
// Output Logic - Execute
always @ (posedge cpuClk or negedge cpuRst)
begin
if (~cpuRst)
begin
ACC <= 0;
IR <= 0;
end
else
begin
PC = PC + 1;
case({add, sub, mul, srl, sra, sll, nul, ld})
8'b1000_0000 : {OF, ACC} <= ACC + A[3:0];
8'b0100_0000 : {OF, ACC} <= ACC - A[3:0];
8'b0010_0000 : ACC <= ACC * A[3:0];
8'b0001_0000 : ACC <= A[3:0] >> Sa;
8'b0000_1000 : ACC <= A[3:0] >>> Sa;
8'b0000_0100 : ACC <= A[3:0] << Sa;
8'b0000_0010 : ACC <= 0;
8'b0000_0001 : ACC <= A[3:0];
default : ACC <= 0;
endcase
end
end
// Drive
assign cpuOut = ACC; // CPU output logic
In this phase, we look at the signals generated by the instruction decoder to execute a specific operation on a specific set of data. The above logic is implemented using a One-Hot encoding mechanism to facilitate a simple design. Finally, the output is driven using the accumulator value.
Verification
To verify the design we need to have 2 major entities which include a test function and a test memory. As the memory is internal to the CPU we would use a bin file to load the data into the memory and would execute the instructions.
module CPU_TB ();
// Test ports
reg cpuClk;
reg cpuRst;
reg wm;
// Variable for itration
integer i;
wire [3:0] cpuOut;
The above code snippet sets up the basic variables required for the CPU’s verification.
// Clocking and System Initialization
initial
begin
cpuClk = 0;
cpuRst = 0;
wm = 0;
forever #4 cpuClk = ~cpuClk;
end
The above block implements the initialization and clocking logic for the test bench.
// Test instance
CPUACC cpu0 (
.cpuClk(cpuClk),
.cpuRst(cpuRst),
.wm(wm),
.cpuOut(cpuOut)
);
The above code snippet instantiates the implemented CPU for testing. The instance name cpu0 is a vital component to this verification as this would be used to load the memory instance in the CPU.
// Memory write task
task memwrite (input [15:0] data, input [4:0] addr);
begin
cpu0.mem[addr] = data;
end
endtask
// Memory Read Task
task memread (input [4:0] raddr);
begin
$display("The Addr : %d has Value : %b", raddr, cpu0.mem[raddr]);
end
endtask
To simplify the verification effort I decided to write two simple tasks that would help us read and write to the memory instance.
// Write all zero to locations of the Cpu memory
initial
begin
for (i = 0 ; i < 32 ; i = i + 1)
begin
memwrite(0, i);
end
#8;
for (i = 0 ; i < 32 ; i = i + 1)
begin
memread(i);
end
// $finish;
end
The above code snippet writes zeros to all the locations in the memory. This acts as a clear sequence and helps us properly initialize the CPU memory.
Now we need to define the instruction memory for the CPU. To do this we write a simple .bin file that would hold the instructions for the CPU.
0000100011111000
1111111111111111
0010000000110000
0000100000011011
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
1000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
This .bin instance tests 3 different operations that include NUL, SRA & ADD. Now this will be loaded into the CPU for execution. Using the memory read and write tasks we verify the contents of the memory.
// Read Instructions from a binary file
initial
begin
$readmemb("instr_mem.bin", cpu0.mem);
end
// Bin file read test loop
initial
begin
for (i = 0 ; i < 32 ; i = i + 1)
begin
memread(i);
end
$display ("Execution Loop ... \n");
// $finish;
end
This proves that we successfully loaded the .bin file into the CPU memory. Now we disable the reset and allow the CPU to execute the instructions in the memory.
initial
begin
$dumpfile("cpu.vcd");
$dumpvars();
#20;
cpuRst = 1;
#100;
$monitor ("CPU output : %b", cpu0.cpuOut);
$finish;
end
endmodule
Simulation Result
Code Base
If you encounter any problems with the RTL please do let me know through the comments.
Media
https://www.linkedin.com/in/srimanth-tenneti-662b7117b/
If you found this interesting do follow my Medium and Hackernoon page for more interesting content. Also, please like and share this article if interested.