Cycle accurate emulation becomes impossible in the following circumstances:
  * Branch prediction and pipelining can cause out of order execution
    and the execution path become data dependent.
  * Cache memory.  It can be very difficult to predict a cache flush or
    cache miss or cache look aside buffer hit
  * Memory management can inject wait states and cause other cycle
    counting issues
  * Peripherals can inject unpredictable wait states
  * Multi-core processors because you don't necessarily know what core
    is doing what and possibly one core waiting on another core.
  * DMA can cause some CPUs to pause because the bus is busy doing DMA
    transfers (not all processors have this as an issue).
  * Some CPUs shut down clocks and peripherals if they are not used and
    they take time to re-start.
  * Any code that waits for some kind of external input.
When I was working for a 6800 C compiler company we could simulate all
68000 CPUs before the 68020.  The 68020 with it's pipelining and branch
prediction made it impossible to do cycle accurate timing.
On 4/22/2024 1:46 PM, Paul Koning via cctalk wrote:
  On Apr 22, 2024, at 2:34 PM, Chuck Guzis via
cctalk<cctalk(a)classiccmp.org>  wrote:
 On 4/22/24 11:09, Bill Gunshannon via cctalk wrote:
  Following along this line of thought but also in
regards all our
 other small CPUs....
 Would it not be possible to use something like a Blue Pill to make
 a small board (small enough to actually fit in the CPU socket) that
 emulated these old CPUs?  Definitely enough horse power just wondered
 if there was enough room for the microcode.
 
 Blue pills are so yesterday!  There are far more small-footprint MCUs
 out there.   More RAM than any Z80 ever had as well as lots of flash for
 the code as well as pipelined 32-bit execution at eye-watering (relative
 to the Z80) speeds.
 Could it emulate a Z80?  I don't see any insurmountable obstacles to
 that.  Could it be cycle- and timing- accurate?   That's a harder one to
 predict, but probably.
 
 Probably not.  Cycle accurate simulation is very hard.  It's only rarely been
done for any CPU, and if done it tends to be incredibly slow.  I remember once using a
MIPS cycle-accurate simulator (for the SB-1, the core inside the SB-1250, later called
BCM-12500).  It was needed because the L2 cache flush code could not be debugged any other
way, but it was very slow indeed.  Almost as bad as running the CPU logic model in a
Verilog or VHDL simulator.  I don't remember the numbers but it probably was only a
few thousand instructions per second.
 Then again, for the notion of a drop-in replacement for the original chip, you don't
need a cycle accurate simulator, just one with compatible pin signalling.  That's not
nearly so hard -- though still harder than a SIMH style ISA simulation.
        paul