Accumulator-Based CPU Design

8 min readNov 27, 2022

Introduction

This article describes a simple accumulator-based Von Neumann CPU design using Verilog HDL. This would provide a simple framework for the readers to create their own CPUs with more robust, well-defined, and efficient instructions. The instructions that would be modeled would be register to register type only with no branch and no jump type.

This is a simple weekend project that takes about 3–4 hours to build and is a fun way to get started with CPU design.

Accumulator Based CPUs

In the early computing era, most computers were accumulator based where the processing entities would have a lot of registers to store the data but one common special register called the accumulator would be used to store the output and also act as one of the inputs.

Some notable computers that used accumulators are the ENIAC, IBM 701, HP2100, Intel 4004, Intel 8008, Intel 8085, etc.

The typical accumulator-based CPU architecture would look something like the figure below.

The ALU (Arithmetic and Logic unit) is the heart of all computation. This would be used to perform the main operations of the CPU. The ACC (Accumulator) is the common operand for all the operations of the CPU. the IR (Instruction Register) would store the instructions from memory and would set the data for the decode and execute states. Once the decode is complete the ALU would have the memory location for the data and the opcode for the operation which would be used to complete the execution of the instruction. Once the execution is complete the PC (Program Counter) would be incremented to point to the next location in the memory.

Design Approach

To design any CPU the first vital step would be to decide an ISA for the CPU. To start I would first like to model the instruction format for the CPU.

Instruction Format for the X1CPU

For the scope of this simple design, we would cover a few register-register instructions like addition, subtraction, multiplication, logical right shift, arithmetic right shift, logical left shift, null and load, etc.

System Design

To implement the design we start with 3major components which are the declarations, the instruction decoder, and the ALU. The remaining registers are going to be a part of either one of the above three phases.

1. Module and Variable Declaration

/*
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Design : Accumulator Based CPU - X1
Engineer : Srimanth Tenneti
Date: 27th November 2022
Description: 
   1. ADD, SUB, MUL, SRL, SRA, SLL, NUL, LD - instructions supported
   2. No branch | jump
   3. Register - Register mode only 
   4. Internal Memory 
Version: 0.01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
*/

module CPUACC #(parameter W = 4)(
  // Global Signals
  input cpuClk,
  input cpuRst,
  input wm,
  // Output 
  output [W-1 : 0] cpuOut
); 
  
// CPU Register instantiation 
  // 32 locations deep 16 bits wide
  reg [15:0] mem [0:31]; 
  reg [3:0] ACC; // Accumulator register 
  reg [31:0] IR; // Instruction Register 
  reg [3:0] A; // Value 1
  
  reg OF;  // To handle Overflow (Simple Flag)
  reg [4:0] PC; // Program Counter
 
  
  // Instruction Deocode Stage
  reg [2:0] opcode; // Opcode
  reg [4:0] SA0; // Source Address
  reg [4:0] DA0; // Destination Address
  reg [2:0] Sa; // Shift amount
  
  // Decode Signals
  reg add, sub, mul, srl, sra, sll, nul, ld;

In the above part, we declared all the variables needed for the decode and execute phase. Apart from that to store the instructions we instantiated a memory that is 16 bits wide and 32 locations deep. The reset in this implementation would be active low and asynchronous. As there would be 32 locations we would need a 5-bit Program Counter which is also instantiated.

2. Instruction Decode

 
  always @ (*)
    begin
      if (~cpuRst) 
        begin
          opcode = 3'b111; // Defaults to load A 
          // Clears the remaining values
          SA0 = 0; 
          DA0 = 0;
          Sa  = 0; 
          PC  = 0;  
          add = 0;
          sub = 0; 
          mul = 0; 
          srl = 0; 
          sra = 0; 
          sll = 0; 
          nul = 0; 
          ld  = 0; 
          A = 0; 
          ACC = 0; 
          OF = 0; 
        end
      else 
        begin

          // Decode
          IR     = mem[PC]; 
          opcode = IR[5:3]; 
          Sa     = IR[2:0]; 
          SA0    = IR[15:11]; 
          DA0   = IR[10:6];      
          
          // Fetch 
          A = mem[SA0]; 
          
          case (opcode) 
               3'b000 : begin
                  add = 1;
                  sub = 0; 
                  mul = 0; 
                  srl = 0; 
                  sra = 0; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 0; 
        
               end
              3'b001 : begin
                  add = 0;
                  sub = 1; 
                  mul = 0; 
                  srl = 0; 
                  sra = 0; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 0; 
         
              end
              3'b010 : begin
                  add = 0;
                  sub = 0; 
                  mul = 1; 
                  srl = 0; 
                  sra = 0; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 0; 
        
              end
              3'b011 : begin
                  add = 0;
                  sub = 0; 
                  mul = 0; 
                  srl = 1; 
                  sra = 0; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 0; 
           
              end
              3'b100 : begin
                  add = 0;
                  sub = 0; 
                  mul = 0; 
                  srl = 0; 
                  sra = 1; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 0; 
           
              end
              3'b101 : begin
                  add = 0;
                  sub = 0; 
                  mul = 0; 
                  srl = 0; 
                  sra = 0; 
                  sll = 1; 
                  nul = 0; 
                  ld  = 0; 
           
              end
              3'b110 : begin
                  add = 0;
                  sub = 0; 
                  mul = 0; 
                  srl = 0; 
                  sra = 0; 
                  sll = 0; 
                  nul = 1; 
                  ld  = 0; 
              
              end
              3'b111 : begin
                  add = 0;
                  sub = 0; 
                  mul = 0; 
                  srl = 0; 
                  sra = 0; 
                  sll = 0; 
                  nul = 0; 
                  ld  = 1; 
                  
              end
          endcase
        end
    end

The above code snippet depicts the instruction decode logic of the CPU. Here based on the opcode specific signals are triggered that help the ALU decide the operation for the cycle. The value 1 indicates the specific operation. The IR is used to store the data from the memory which is decoded into SA0, DA0, Opcode, and Sa by the decoder and this data is used for the execution of the requested operation.

3. Execute Phase

// Output Logic - Execute
  
  always @ (posedge cpuClk or negedge cpuRst)
    begin
      if (~cpuRst)
        begin
          ACC <= 0;
          IR  <= 0; 
        end 
      else 
        begin
          PC = PC + 1; 
          case({add, sub, mul, srl, sra, sll, nul, ld})
            8'b1000_0000 : {OF, ACC} <= ACC + A[3:0]; 
            8'b0100_0000 : {OF, ACC} <= ACC - A[3:0]; 
              8'b0010_0000 : ACC <= ACC * A[3:0]; 
              8'b0001_0000 : ACC <= A[3:0] >> Sa;
              8'b0000_1000 : ACC <= A[3:0] >>> Sa; 
              8'b0000_0100 : ACC <= A[3:0] << Sa; 
              8'b0000_0010 : ACC <= 0; 
              8'b0000_0001 : ACC <= A[3:0]; 
              default : ACC <= 0; 
          endcase
        end
    end
  
  // Drive
  
  assign cpuOut = ACC; // CPU output logic

In this phase, we look at the signals generated by the instruction decoder to execute a specific operation on a specific set of data. The above logic is implemented using a One-Hot encoding mechanism to facilitate a simple design. Finally, the output is driven using the accumulator value.

Verification

To verify the design we need to have 2 major entities which include a test function and a test memory. As the memory is internal to the CPU we would use a bin file to load the data into the memory and would execute the instructions.

module CPU_TB (); 
  
  // Test ports 
  
  reg cpuClk;
  reg cpuRst;
  reg wm; 
  
  // Variable for itration 
  integer i; 
  
  wire [3:0] cpuOut;

The above code snippet sets up the basic variables required for the CPU’s verification.

// Clocking and System Initialization 
  
  initial 
    begin
       cpuClk = 0; 
       cpuRst = 0; 
       wm = 0; 
       forever #4 cpuClk = ~cpuClk; 
    end

The above block implements the initialization and clocking logic for the test bench.


  // Test instance
  
  CPUACC cpu0 (
    .cpuClk(cpuClk),
    .cpuRst(cpuRst),
    .wm(wm),
    .cpuOut(cpuOut)
  );

The above code snippet instantiates the implemented CPU for testing. The instance name cpu0 is a vital component to this verification as this would be used to load the memory instance in the CPU.

// Memory write task
  
  task memwrite (input [15:0] data, input [4:0] addr); 
    begin
      cpu0.mem[addr] = data;
    end
  endtask
  
  // Memory Read Task 
  
  task memread (input [4:0] raddr); 
    begin
      $display("The Addr : %d has Value : %b", raddr, cpu0.mem[raddr]); 
    end
  endtask

To simplify the verification effort I decided to write two simple tasks that would help us read and write to the memory instance.

// Write all zero to locations of the Cpu memory 
  
  initial 
    begin
      for (i = 0 ; i < 32 ; i = i + 1)
        begin
          memwrite(0, i); 
        end
      #8; 
      for (i = 0 ; i < 32 ; i = i + 1)
        begin
          memread(i);
        end
      // $finish; 
    end

The above code snippet writes zeros to all the locations in the memory. This acts as a clear sequence and helps us properly initialize the CPU memory.

Now we need to define the instruction memory for the CPU. To do this we write a simple .bin file that would hold the instructions for the CPU.

0000100011111000
1111111111111111
0010000000110000
0000100000011011
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
1000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000

This .bin instance tests 3 different operations that include NUL, SRA & ADD. Now this will be loaded into the CPU for execution. Using the memory read and write tasks we verify the contents of the memory.

// Read Instructions from a binary file 
  
  initial 
    begin 
      $readmemb("instr_mem.bin", cpu0.mem); 
    end

 // Bin file read test loop 
  
  initial 
    begin
      for (i = 0 ; i < 32 ; i = i + 1)
        begin
          memread(i);
        end
      $display ("Execution Loop ... \n"); 
      // $finish; 
    end

This proves that we successfully loaded the .bin file into the CPU memory. Now we disable the reset and allow the CPU to execute the instructions in the memory.

 initial 
    begin
      $dumpfile("cpu.vcd"); 
      $dumpvars(); 
      #20;
      cpuRst = 1; 
      #100; 
      $monitor ("CPU output : %b", cpu0.cpuOut); 
      
      $finish; 
    end
endmodule

Simulation Result

Code Base

GitHub - srimanthtenneti/X1CPU: Simple Accumulator based Von Neumann CPU

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

If you encounter any problems with the RTL please do let me know through the comments.

Media

Sub_Zero_AI_Freak | HackerNoon

I am an AI developer and my areas of expertise are Deep Learning , High-Performance Computing .

hackernoon.com

https://www.linkedin.com/in/srimanth-tenneti-662b7117b/

If you found this interesting do follow my Medium and Hackernoon page for more interesting content. Also, please like and share this article if interested.