Accelerator boards - no future? Bad business?

Sat Apr 23 09:49:07 CDT 2016

On Sat, 23 Apr 2016, Sean Conner wrote:

> > >   One major problem with adding a faster CPU to an SGI is the MIPS chip
> > > itself---code compiled for one MIPS CPU (say, the R3000) won't run on
> > > another MIPS CPU (say, the R4400) due to the differences in the pipeline.
> > > MIPS compilers were specific for a chip because such details were not hidden
> > > in the CPU itself, but left to the compiler to deal with.
> > 
> > Having written a bunch of R3000 and R4000/4200/4300/4400/4600 assembly
> > code in the 1990s, my (possibly faulty) recollection disagrees with
> > you. There are differences in supervisor-mode programming, but I don't
> > recall any issues with running 32-bit user-mode R3000 code on any
> > R4xxx. The programmer-visible pipelline behavior (e.g., branch delay
> > slots) were the same.
> 
>   Hmm ... I might have been misremembering.  I just checked the book I have
> on the MIPS, and yes, the supervisor stuff is different between the R2000,
> R3000, R4000 and R6000.  Also, the R2000, R3000 and R6000 have a five stage
> pipeline, and the R4000 has an eight stage pipeline.

 Pipeline restrictions were gradually relaxed by adding more and more 
interlocks as the architecture evolved.  So while user mode code compiled 
for a higher ISA might not necessarily work with an older one even if it 
only used instructions defined in the older ISA, there was no issue the 
other way round, old code was forward compatible with newer hardware (or, 
depending on how you look at it new hardware was backward compatible with 
older code).

 The timeline was roughly:

- MIPS II -- removed load delay slots -- for memory read instructions 
             targetting both general purpose and coprocessor registers,

- MIPS IV -- removed coprocessor transfer and condition code delay slots 
             -- for instructions used to move data between general purpose 
             and coprocessor registers as well as ones setting or reading
             coprocessor condition codes.

The original MIPS I ISA only had an interlock on multiply-divide unit 
(MDU) accumulator accesses, so all the other pipeline hazards had to be 
handled in software, by inserting the right number of instructions between 
the producer and the consumer of data; NOPs were used where no useful 
instructions could be scheduled.

 Some operations continued to require a manual resolution of pipeline 
hazards even in the MIPS IV ISA, like moves to the MDU accumulator, as 
well as many privileged operations (TLB writes, mode switches, etc.).  
For these the SSNOP (superscalar NOP) instruction was introduced, which 
was guaranteed not to be nullified with superscalar pipelines.  The 
encoding was chosen such that it was backwards compatible, using one of 
the already existing ways to express an operation with no visible effects 
other than incrementing the PC, which given the design of the MIPS 
instruction set there has been always a plethora of.  Consequently SSNOP 
was executed as an ordinary NOP by older ISA implementations.

 NB despite the hardware interlocks it has always been preferable to avoid 
pipeline stalls triggered by them by scheduling the right minimum number 
of instructions between data producers and the respective consumers anyway 
and compilers have had options to adapt here to specific processor 
implementations.  The addition of hardware interlocks made the life of 
compiler (and handcoded assembly) writers a little bit easier as a missed 
optimisation didn't result in broken code.  Also more compact code could 
be produced where there was no way to schedule useful code to satisfy 
pipeline hazards and NOP would have to be inserted otherwise.

 I won't dive into the details of the further evolution with modern MIPS 
ISAs here, for obvious reasons.

  Maciej