An Approach of Algorithm Based Fault Tolerance for High Performance Computing Systems
عنوان مقاله: An Approach of Algorithm Based Fault Tolerance for High Performance Computing Systems
شناسه ملی مقاله: SASTECH05_116
منتشر شده در پنجمین کنفرانس بین المللی پیشرفت های علوم و تکنولوژی در سال 1390
شناسه ملی مقاله: SASTECH05_116
منتشر شده در پنجمین کنفرانس بین المللی پیشرفت های علوم و تکنولوژی در سال 1390
مشخصات نویسندگان مقاله:
H Hamidi - Islamic Azad University -Doroud Branch
خلاصه مقاله:
H Hamidi - Islamic Azad University -Doroud Branch
We present a new approach to algorithm based fault tolerance (ABFT) for High Performance Computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. We have implemented a systematic procedure for introducing structured redundancy into ABFT. Algorithm Based Fault Tolerance has been recommending as a cost-effective concurrent error detection scheme. It proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. To that end, a matrix-based model has been developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated
کلمات کلیدی: Algorithm Based Fault Tolerance (ABFT), Checkpointing, Error Correction, Matrix operations
صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/157425/