FPGA Game Boy Part 4: Loading immediate values and halting the CPU

14 May 2018

It’s been a little while, but in the last post I showed my SpinalHDL implementation of the Z80-ish ALU, part of the Game Boy’s LR35902 CPU. We could compile and run a program with NOP, INC, DEC, etc. However, because the microcode doesn’t support load instructions yet, building complex values is a bit difficult. In this post, I’m going to briefly show how I added the 8-bit LR35902 load-immediate instructions and the halt instruction to make simulating the CPU more friendly.

Load Immediate Instructions

The LR35902’s load-immediate instructions are rather straightforward to implement, since they fetch the immediate data from the address of the program counter (PC). This means we don’t have to support indirect memory addresses, auto-incrementing, or anything fancy yet. The decoder needs to output a signal indicating a memory read cycle, which will cause the CPU to load the temp register with the 8-bit value from memory.

In CpuDecoder, I added the memRead field for to MCycle and a helper function for generating microcode:

case class MCycle(
  aluOp: SpinalEnumElement[AluOp.type],
  opBSelect: Option[Int],
  storeSelect: Option[Int],
  memRead: Boolean
)

def memReadCycle(aluOp: SpinalEnumElement[AluOp.type],
                 storeSelect: Option[Int]) = {
  MCycle(aluOp, None, storeSelect, true)
}

Taking another look at the Game Boy CPU instruction set, the LD x, d8 instructions take 8 T-cycles, which means two M cycles. The first M-cycle reads the opcode without any operation or register write-back, and the second cycle writes the memory value to the register.

// ld B, d8
(0x06, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.B)))),
// ld C, d8
(0x0E, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.C)))),
// ld D, d8
(0x16, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.D)))),
// ld E, d8
(0x1E, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.E)))),
// ld H, d8
(0x26, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.H)))),
// ld L, d8
(0x2E, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.L)))),
// ld A, d8
(0x3E, Seq(fetchCycle(AluOp.Nop, None, None),
           memReadCycle(AluOp.Nop, Some(Reg8.A))))

The T-cycle state machine only has to change a little bit. Instead of storing the data from memory into the instruction register, we store it into the temp register during a memory read cycle in t2State.

val tCycleFsm = new StateMachine {
  val t1State: State = new State with EntryPoint {
    onEntry {
      address := registers16(Reg16.PC)
      mreq := True
    }
    whenIsActive {
      mreq := False
      goto(t2State)
    }
  }
  val t2State = new State {
    whenIsActive {
      // handle memory read cycles
      when(decoder.io.memRead) {
        temp := io.dataIn
      }.otherwise {
        ir := io.dataIn
      }
      registers16(Reg16.PC) := registers16(Reg16.PC) + 1
      goto(t3State)
    }
  }
  val t3State = new State {
    whenIsActive {
      when(decoder.io.loadOpB) {
        temp := registers8(decoder.io.opBSelect)
      }
      goto(t4State)
    }
  }
  val t4State = new State {
    whenIsActive {
      when(decoder.io.store) {
        registers8(decoder.io.storeSelect) := alu.io.result
      }
      registers8(Reg8.F) := alu.io.flagsOut
      mCycle := decoder.io.nextMCycle
    }
  }
}

Halt instruction

Adding the halt instruction was simple, but valuable for ending simulation runs automatically when the loaded program is complete. First, I added another MCycle field (and also wired up the decoder output):

case class MCycle(
  aluOp: SpinalEnumElement[AluOp.type],
  opBSelect: Option[Int],
  storeSelect: Option[Int],
  memRead: Boolean
  halt: Boolean
)

In the T-cycle state machine, we register the decoded halt signal in t3State, and then idle forever in t4State. Later on this could potentially gate the clock, instead of looping endlessly. But, this simple implementation will work for now. Moreover, the Game Boy’s clock frequency is so low (4.19 MHz) that I’m not too worried about the power dissipation on the Zynq FPGA fabric.

val t3State = new State {
  whenIsActive {
    when(decoder.io.loadOpB) {
      temp := registers8(decoder.io.opBSelect)
    }
    halt := decoder.io.nextHalt
    goto(t4State)
  }
}
val t4State = new State {
  whenIsActive {
    when(decoder.io.store) {
      registers8(decoder.io.storeSelect) := alu.io.result
    }
    registers8(Reg8.F) := alu.io.flagsOut
    mCycle := decoder.io.nextMCycle
    when (!halt) {
      goto(t1State)
    }
  }
}

Simulation results

To test the new instructions, I added to the end of the test program from before:

ld a, $AA
ld b, $BB
ld c, $CC
ld d, $DD
ld e, $EE
ld h, $55
ld l, $77

halt

It’s built with the RGBDS tool-chain and Makefile from a few posts ago, and then loaded into the SpinalHDL simulation. In the sbt console, runMain slabboy.TopLevelSim runs the simulation and outputs the wave dump file.

GTKWave showing load immediate instructions in action.

The load-immediate instructions in action

I also modified the simulation test bench a bit to automatically stop the simulation after the CPU halt signal is asserted.

object TopLevelSim {
  def loadProgram(name: String): Array[Byte] = {
    Files.readAllBytes(Paths.get(name))
  }

  def main(args: Array[String]) {
    SimConfig.withWave.compile(new SlabBoy).doSim { dut =>
      // perform initial reset and generate a clock signal
      dut.clockDomain.forkStimulus(period = 2)
      
      // create a concurrent thread to handle the memory accesses
      // separate from the main test bench execution
      fork {
        val memory = loadProgram("sw/test.gb")
        while (true) {
          dut.clockDomain.waitRisingEdgeWhere(dut.io.en.toBoolean == true)
          val address = dut.io.address.toInt
          dut.io.dataIn #= memory(address).toInt & 0xFF
          dut.clockDomain.waitRisingEdgeWhere(dut.io.en.toBoolean == false)
          dut.io.dataIn.randomize
        }
      }

      // wait for halt, sleep for a couple cycles so the halt is visible in
      // the waveform dump,  and then stop the simulation
      dut.clockDomain.waitRisingEdgeWhere(dut.io.halt.toBoolean == true)
      sleep(2)
      simSuccess()
    }
  }
}

GTKWave showing the simulation ends after the halt instruction.

The simulation ends after the halt signal is asserted.

Well, that’s it for this post. Next time I’m going to go throught quickly implementing a boatload of register-to-register load instructions.

SlabBoy repo tag for this post

SlabBoy repo master

FPGA Game Boy Part 4: Loading immediate values and halting the CPU

Load Immediate Instructions

Halt instruction

Simulation results

Recent Posts

Convincing probe-rs to Work with VexRiscv 12 Sep 2024

Debugging VexRiscv Over a JTAG Tunnel with OpenOCD 09 Sep 2024

From eBay junk to JTAG on a gigantic FPGA board 30 Dec 2020