Accelerator boards - no future? Bad business?
Maciej W. Rozycki
macro at linux-mips.org
Sat Apr 23 09:49:07 CDT 2016
On Sat, 23 Apr 2016, Sean Conner wrote:
> > > One major problem with adding a faster CPU to an SGI is the MIPS chip
> > > itself---code compiled for one MIPS CPU (say, the R3000) won't run on
> > > another MIPS CPU (say, the R4400) due to the differences in the pipeline.
> > > MIPS compilers were specific for a chip because such details were not hidden
> > > in the CPU itself, but left to the compiler to deal with.
> > Having written a bunch of R3000 and R4000/4200/4300/4400/4600 assembly
> > code in the 1990s, my (possibly faulty) recollection disagrees with
> > you. There are differences in supervisor-mode programming, but I don't
> > recall any issues with running 32-bit user-mode R3000 code on any
> > R4xxx. The programmer-visible pipelline behavior (e.g., branch delay
> > slots) were the same.
> Hmm ... I might have been misremembering. I just checked the book I have
> on the MIPS, and yes, the supervisor stuff is different between the R2000,
> R3000, R4000 and R6000. Also, the R2000, R3000 and R6000 have a five stage
> pipeline, and the R4000 has an eight stage pipeline.
Pipeline restrictions were gradually relaxed by adding more and more
interlocks as the architecture evolved. So while user mode code compiled
for a higher ISA might not necessarily work with an older one even if it
only used instructions defined in the older ISA, there was no issue the
other way round, old code was forward compatible with newer hardware (or,
depending on how you look at it new hardware was backward compatible with
The timeline was roughly:
- MIPS II -- removed load delay slots -- for memory read instructions
targetting both general purpose and coprocessor registers,
- MIPS IV -- removed coprocessor transfer and condition code delay slots
-- for instructions used to move data between general purpose
and coprocessor registers as well as ones setting or reading
coprocessor condition codes.
The original MIPS I ISA only had an interlock on multiply-divide unit
(MDU) accumulator accesses, so all the other pipeline hazards had to be
handled in software, by inserting the right number of instructions between
the producer and the consumer of data; NOPs were used where no useful
instructions could be scheduled.
Some operations continued to require a manual resolution of pipeline
hazards even in the MIPS IV ISA, like moves to the MDU accumulator, as
well as many privileged operations (TLB writes, mode switches, etc.).
For these the SSNOP (superscalar NOP) instruction was introduced, which
was guaranteed not to be nullified with superscalar pipelines. The
encoding was chosen such that it was backwards compatible, using one of
the already existing ways to express an operation with no visible effects
other than incrementing the PC, which given the design of the MIPS
instruction set there has been always a plethora of. Consequently SSNOP
was executed as an ordinary NOP by older ISA implementations.
NB despite the hardware interlocks it has always been preferable to avoid
pipeline stalls triggered by them by scheduling the right minimum number
of instructions between data producers and the respective consumers anyway
and compilers have had options to adapt here to specific processor
implementations. The addition of hardware interlocks made the life of
compiler (and handcoded assembly) writers a little bit easier as a missed
optimisation didn't result in broken code. Also more compact code could
be produced where there was no way to schedule useful code to satisfy
pipeline hazards and NOP would have to be inserted otherwise.
I won't dive into the details of the further evolution with modern MIPS
ISAs here, for obvious reasons.
More information about the cctalk