Tandem memo (why do computers stop)

Andy Piercy andy.piercy at gmail.com
Tue Oct 30 13:00:43 CDT 2007

I worked on a UK designed lock step fault tolerant computer for Sun Micro
systems the Netra FT1800.

This was a huge beast based on the E450 and had all the modules duplicated
and hot swap out design.

You could just pull out the CPU module (each had up to 4 * 450MHz Ultrasparc
II processor modules and 4gb of ram) and the system would continue running
with out a glitch and on reinsertion the system would reload and

You could do this with any of the modules, disk, I/O (PCI cards fitted in
hot swap carriers, fan modules, psu's it was a direct competitor to Tandem.


There were issues with the lock step and sometimes they would go out of sync
b ut they never just stopped...

Anyone ever seen these?



On 30/10/2007, Chuck Guzis <cclist at sydex.com> wrote:
> On 30 Oct 2007 at 9:17, Chris Kennedy wrote:
> > Yep, that was the Tandem way.  You could watch the lights blink on the
> > first processor, count two and watch the lights do precisely the same
> > thing on the second.
> Yes, but as I said "it's nothing that simple"--to say that it was
> would be completely discounting the enormous investment in software
> that Tandem made to produce their NonStop systems.
> Heck, back around then, a friend and I prototyped a system with three
> PC/XT's and a proprietary expansion card that did three-way voting
> and also performed hot replacement of failed processors.    Basically
> a garage operation and nearly sold to a then-cash-rich Everex.  Maybe
> good enough for process control, but too weak for anything more
> involved than that.  Our selling point was that it was off-the-shelf
> and cheap.  I think I still have the OrCAD files for our board
> somewhere on a 5.25" 360K floppy.
> We did nothing about what software ran on the system--and that was
> the giant weakness.  Without software, it was just another
> interesting piece of iron.
> Simple redundancy doesn't always identify which of the two systems is
> producing the error--only that there was an error--and that's where
> Tandem's genius comes in.
> Tandem was a whole world apart--not only did they have hardware
> redundancy (which would have been no great shucks back then), but
> their software was constructed along a modular transaction-based
> model, so that transactions were never lost. (Hence the popularity of
> these in the banking sector).
> Cheers,
> Chuck

More information about the cctalk mailing list