FPGA Game Boy Part 4: Loading immediate values and halting the CPU
It’s been a little while, but in the last post I showed my SpinalHDL implementation of the Z80-ish ALU, part of the Game Boy’s LR35902 CPU. We could compile and run a program with NOP, INC, DEC, etc. However, because the microcode doesn’t support load instructions yet, building complex values is a bit difficult. In this post, I’m going to briefly show how I added the 8-bit LR35902 load-immediate instructions and the halt instruction to make simulating the CPU more friendly.
Load Immediate Instructions
The LR35902’s load-immediate instructions are rather straightforward to implement, since they fetch the immediate data from the address of the program counter (PC). This means we don’t have to support indirect memory addresses, auto-incrementing, or anything fancy yet. The decoder needs to output a signal indicating a memory read cycle, which will cause the CPU to load the temp register with the 8-bit value from memory.
In CpuDecoder, I added the memRead
field for to MCycle and a helper function for generating microcode:
case class MCycle(
aluOp: SpinalEnumElement[AluOp.type],
opBSelect: Option[Int],
storeSelect: Option[Int],
memRead: Boolean
)
def memReadCycle(aluOp: SpinalEnumElement[AluOp.type],
storeSelect: Option[Int]) = {
MCycle(aluOp, None, storeSelect, true)
}
Taking another look at the Game Boy CPU instruction set, the LD x, d8
instructions take 8 T-cycles, which means two M cycles. The first M-cycle reads the opcode without any operation or register write-back, and the second cycle writes the memory value to the register.
// ld B, d8
(0x06, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.B)))),
// ld C, d8
(0x0E, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.C)))),
// ld D, d8
(0x16, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.D)))),
// ld E, d8
(0x1E, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.E)))),
// ld H, d8
(0x26, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.H)))),
// ld L, d8
(0x2E, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.L)))),
// ld A, d8
(0x3E, Seq(fetchCycle(AluOp.Nop, None, None),
memReadCycle(AluOp.Nop, Some(Reg8.A))))
The T-cycle state machine only has to change a little bit. Instead of storing the data from memory into the instruction register, we store it into the temp register during a memory read cycle in t2State
.
val tCycleFsm = new StateMachine {
val t1State: State = new State with EntryPoint {
onEntry {
address := registers16(Reg16.PC)
mreq := True
}
whenIsActive {
mreq := False
goto(t2State)
}
}
val t2State = new State {
whenIsActive {
// handle memory read cycles
when(decoder.io.memRead) {
temp := io.dataIn
}.otherwise {
ir := io.dataIn
}
registers16(Reg16.PC) := registers16(Reg16.PC) + 1
goto(t3State)
}
}
val t3State = new State {
whenIsActive {
when(decoder.io.loadOpB) {
temp := registers8(decoder.io.opBSelect)
}
goto(t4State)
}
}
val t4State = new State {
whenIsActive {
when(decoder.io.store) {
registers8(decoder.io.storeSelect) := alu.io.result
}
registers8(Reg8.F) := alu.io.flagsOut
mCycle := decoder.io.nextMCycle
}
}
}
Halt instruction
Adding the halt instruction was simple, but valuable for ending simulation runs automatically when the loaded program is complete. First, I added another MCycle field (and also wired up the decoder output):
case class MCycle(
aluOp: SpinalEnumElement[AluOp.type],
opBSelect: Option[Int],
storeSelect: Option[Int],
memRead: Boolean
halt: Boolean
)
In the T-cycle state machine, we register the decoded halt signal in t3State, and then idle forever in t4State. Later on this could potentially gate the clock, instead of looping endlessly. But, this simple implementation will work for now. Moreover, the Game Boy’s clock frequency is so low (4.19 MHz) that I’m not too worried about the power dissipation on the Zynq FPGA fabric.
val t3State = new State {
whenIsActive {
when(decoder.io.loadOpB) {
temp := registers8(decoder.io.opBSelect)
}
halt := decoder.io.nextHalt
goto(t4State)
}
}
val t4State = new State {
whenIsActive {
when(decoder.io.store) {
registers8(decoder.io.storeSelect) := alu.io.result
}
registers8(Reg8.F) := alu.io.flagsOut
mCycle := decoder.io.nextMCycle
when (!halt) {
goto(t1State)
}
}
}
Simulation results
To test the new instructions, I added to the end of the test program from before:
ld a, $AA
ld b, $BB
ld c, $CC
ld d, $DD
ld e, $EE
ld h, $55
ld l, $77
halt
It’s built with the RGBDS tool-chain and Makefile from a few posts ago, and then loaded into the SpinalHDL simulation. In the sbt console, runMain slabboy.TopLevelSim
runs the simulation and outputs the wave dump file.
I also modified the simulation test bench a bit to automatically stop the simulation after the CPU halt signal is asserted.
object TopLevelSim {
def loadProgram(name: String): Array[Byte] = {
Files.readAllBytes(Paths.get(name))
}
def main(args: Array[String]) {
SimConfig.withWave.compile(new SlabBoy).doSim { dut =>
// perform initial reset and generate a clock signal
dut.clockDomain.forkStimulus(period = 2)
// create a concurrent thread to handle the memory accesses
// separate from the main test bench execution
fork {
val memory = loadProgram("sw/test.gb")
while (true) {
dut.clockDomain.waitRisingEdgeWhere(dut.io.en.toBoolean == true)
val address = dut.io.address.toInt
dut.io.dataIn #= memory(address).toInt & 0xFF
dut.clockDomain.waitRisingEdgeWhere(dut.io.en.toBoolean == false)
dut.io.dataIn.randomize
}
}
// wait for halt, sleep for a couple cycles so the halt is visible in
// the waveform dump, and then stop the simulation
dut.clockDomain.waitRisingEdgeWhere(dut.io.halt.toBoolean == true)
sleep(2)
simSuccess()
}
}
}
Well, that’s it for this post. Next time I’m going to go throught quickly implementing a boatload of register-to-register load instructions.