Tandem memo (why do computers stop)
Andy Piercy
andy.piercy at gmail.com
Tue Oct 30 13:00:43 CDT 2007
I worked on a UK designed lock step fault tolerant computer for Sun Micro
systems the Netra FT1800.
This was a huge beast based on the E450 and had all the modules duplicated
and hot swap out design.
You could just pull out the CPU module (each had up to 4 * 450MHz Ultrasparc
II processor modules and 4gb of ram) and the system would continue running
with out a glitch and on reinsertion the system would reload and
resynchronise.
You could do this with any of the modules, disk, I/O (PCI cards fitted in
hot swap carriers, fan modules, psu's it was a direct competitor to Tandem.
http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/Netra_ft_1800/Netra_ft_1800
There were issues with the lock step and sometimes they would go out of sync
b ut they never just stopped...
Anyone ever seen these?
Ta,
Andy.
On 30/10/2007, Chuck Guzis <cclist at sydex.com> wrote:
>
> On 30 Oct 2007 at 9:17, Chris Kennedy wrote:
>
> > Yep, that was the Tandem way. You could watch the lights blink on the
> > first processor, count two and watch the lights do precisely the same
> > thing on the second.
>
> Yes, but as I said "it's nothing that simple"--to say that it was
> would be completely discounting the enormous investment in software
> that Tandem made to produce their NonStop systems.
>
> Heck, back around then, a friend and I prototyped a system with three
> PC/XT's and a proprietary expansion card that did three-way voting
> and also performed hot replacement of failed processors. Basically
> a garage operation and nearly sold to a then-cash-rich Everex. Maybe
> good enough for process control, but too weak for anything more
> involved than that. Our selling point was that it was off-the-shelf
> and cheap. I think I still have the OrCAD files for our board
> somewhere on a 5.25" 360K floppy.
>
> We did nothing about what software ran on the system--and that was
> the giant weakness. Without software, it was just another
> interesting piece of iron.
>
> Simple redundancy doesn't always identify which of the two systems is
> producing the error--only that there was an error--and that's where
> Tandem's genius comes in.
>
> Tandem was a whole world apart--not only did they have hardware
> redundancy (which would have been no great shucks back then), but
> their software was constructed along a modular transaction-based
> model, so that transactions were never lost. (Hence the popularity of
> these in the banking sector).
>
> Cheers,
> Chuck
>
>
>
More information about the cctalk
mailing list