VCF PNW 2018: Pictures!

Ray Arachelian ray at
Wed Feb 21 11:40:13 CST 2018

On 02/21/18 08:16, Peter Corlett via cctalk wrote:
> A programmer of modest ability should be able to knock up a simple
> switch()-based step-by-step CPU emulation in a few hours. This is analogous to
> a simple microcoded CPU and the performance will suck.
Yeah, please don't code this way.  Big huge case statements really suck.

Build a function that can decode opcodes and then dispatches to an array
of functions via pointers instead.
Build tables in the emulator that both help the opcode decoder (and
accelerate it), and also have fields in it that hold opcode cycle
timings, opcode sizes (so you can increment the IR - instruction
Register/Program Counter by the size of the opcode).

You could also cache the results of the opcode decoder, but you'll need
to do a bunch of memory management for that, and detect when a certain
place in memory has been overwritten with new code.
However, this can yield very useful information if you also add a bunch
of extra fields to your instruction cache such as what register values
were used before, what CPU cycle the last time it ran, whether that
opcode accessed I/O or RAM, what MMU context it was in, etc.

These extra bits of recoded "flight data" are gold when debugging or
reverse engineering the running OS/apps later.  For example on CPUs
which access I/O via memory space, if you see a MOVE to a register in a
disassembly, you don't really know what it's doing unless you see the
value in that register at the time it ran, and then you can think, Aha! 
0xfc0020 - that's this I/O register on this specific device here on this
bus, so it can help you locate I/O drivers in the kernel.

Now, you can also do something else that's interesting, if you reverse
engineer the driver a bit and find it's entry and exit points, basically
how it's called, and what it returns, you can trap that in your
emulator, so when your emulator's CPU calls that block of code, you
don't execute that native code, and rather, do whatever that driver does
natively and return the right values on the stack/registers/target
memory/etc.  And this can speed up your emulator quite a bit.  i.e. say
you have some code that loads a file from disk. Rather than emulating
several thousand opcodes, you can replace the whole thing with a block
read from a file and return back to the caller and skip all the
bit-banging.  (But do update the CPU cycle count).

If say, the firmware insists on checking the RAM by writing thousands of
different patterns over many megs of memory, you can detect that in your
emulator and skip it.  No need to make the user wait 5 minutes for the
machine to warm up.  Speed up the boot process.  (Well you can make
these things optional "hacks" that the user can enable or disable.)

> Making it *cycle-accurate* involves deep understanding of the emulated CPU's
> internal architecture. If part of the platform requires cycle-accurate timing
> for bit-banging some hardware device, you're going to need this.
Hopefully this is already documented, if not, having schematics might
help here, but would need lots of work.  (Assuming a multi IC CPU as
opposed to discrete CPU which would likely have great docs already.)

> Making it *fast* also involves being an expert in compiler backends for the
> target architecture, because this requires decompiling and then recompiling the
> code on the fly.
> ... and that's the easy bit. Now you get to emulate the hardware.
Yeah, it's a huge job, but I think in the end, it's totally worth it. 
Just takes a lot of commitment and free time, and a love for the machine
you're trying to emulate.

More information about the cctalk mailing list