CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

An Approach of Algorithm Based Fault Tolerance for High Performance Computing Systems

عنوان مقاله: An Approach of Algorithm Based Fault Tolerance for High Performance Computing Systems
شناسه ملی مقاله: SASTECH05_116
منتشر شده در پنجمین کنفرانس بین المللی پیشرفت های علوم و تکنولوژی در سال 1390
مشخصات نویسندگان مقاله:

H Hamidi - Islamic Azad University -Doroud Branch

خلاصه مقاله:
We present a new approach to algorithm based fault tolerance (ABFT) for High Performance Computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. We have implemented a systematic procedure for introducing structured redundancy into ABFT. Algorithm Based Fault Tolerance has been recommending as a cost-effective concurrent error detection scheme. It proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. To that end, a matrix-based model has been developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated

کلمات کلیدی:
Algorithm Based Fault Tolerance (ABFT), Checkpointing, Error Correction, Matrix operations

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/157425/