Stratus Technologies Continuum servers offer ultra high availability and fault tolerance.

Stratus' Continuum line of machines originally began in 1980 as PA-RISC processor based servers engineered to provide the maximum uptime and availability possible. Paired with Stratus VOS, a modified version of HP-Stratus ContinuumUX or their own Unix System V, Stratus servers are able to deliver 99.999%+ uptime which is arguably one of the primary reasons they have such a strong presence in the telecommunications, banking and medical sectors. 

 

Fault tolerence is one of the mottos around which these machines were designed and a number of ingenious steps were taken to achieve the objective. The "Pair and Spare" architecture is one in which CPUs are installed in two pairs both on independent modules, and each group (4 total) is seen collectively as a logical processor by the OS. The processors are all performing the same tasks simultaneously while the system checks them for consistency. When a discrepency or fault is detected in a physical CPU, the pair that has generated the error is taken offline while the system continues running uninterrupted on the remaining pair.

 

To date, this processor technology/architecture has not migrated to the PC sector because there has been no demand for it. However with consumer level 8-way (and more) machines just around the corner, and a push in the software community for more apps that are multithreaded, it is almost inevitable that some technology will be implemented to deal with failed cores that was at least inspired by this model.

 

 

Stratus Technologies site