Tandem memo (why do computers stop)

Andy Piercy andy.piercy at gmail.com
Tue Oct 30 13:00:43 CDT 2007


I worked on a UK designed lock step fault tolerant computer for Sun Micro
systems the Netra FT1800.

This was a huge beast based on the E450 and had all the modules duplicated
and hot swap out design.

You could just pull out the CPU module (each had up to 4 * 450MHz Ultrasparc
II processor modules and 4gb of ram) and the system would continue running
with out a glitch and on reinsertion the system would reload and
resynchronise.

You could do this with any of the modules, disk, I/O (PCI cards fitted in
hot swap carriers, fan modules, psu's it was a direct competitor to Tandem.

http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/Netra_ft_1800/Netra_ft_1800

There were issues with the lock step and sometimes they would go out of sync
b ut they never just stopped...

Anyone ever seen these?

Ta,

Andy.


On 30/10/2007, Chuck Guzis <cclist at sydex.com> wrote:
>
> On 30 Oct 2007 at 9:17, Chris Kennedy wrote:
>
> > Yep, that was the Tandem way.  You could watch the lights blink on the
> > first processor, count two and watch the lights do precisely the same
> > thing on the second.
>
> Yes, but as I said "it's nothing that simple"--to say that it was
> would be completely discounting the enormous investment in software
> that Tandem made to produce their NonStop systems.
>
> Heck, back around then, a friend and I prototyped a system with three
> PC/XT's and a proprietary expansion card that did three-way voting
> and also performed hot replacement of failed processors.    Basically
> a garage operation and nearly sold to a then-cash-rich Everex.  Maybe
> good enough for process control, but too weak for anything more
> involved than that.  Our selling point was that it was off-the-shelf
> and cheap.  I think I still have the OrCAD files for our board
> somewhere on a 5.25" 360K floppy.
>
> We did nothing about what software ran on the system--and that was
> the giant weakness.  Without software, it was just another
> interesting piece of iron.
>
> Simple redundancy doesn't always identify which of the two systems is
> producing the error--only that there was an error--and that's where
> Tandem's genius comes in.
>
> Tandem was a whole world apart--not only did they have hardware
> redundancy (which would have been no great shucks back then), but
> their software was constructed along a modular transaction-based
> model, so that transactions were never lost. (Hence the popularity of
> these in the banking sector).
>
> Cheers,
> Chuck
>
>
>



More information about the cctalk mailing list