• The Complete Research Material is averagely 59 pages long and it is in Ms Word Format, it has 1-5 Chapters.
  • Major Attributes are Abstract, All Chapters, Figures, Appendix, References.
  • Study Level: BTech, BSc, BEng, BA, HND, ND or NCE.
  • Full Access Fee: ₦4,000

Get the complete project » Instant Download Active



1.1 Background of the Study

Grid computing uses a computer network in which each computer's resources are shared with every other computer in the system. In view of this, computing becomes pervasive and individual users (or client applications) gain access to computing resources (processors, storage, data, applications, and so on) as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are. The main objective in grid scheduling is to finish a job or application as soon as possible(Harshadkumar and Vipul, 2014). Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes cooperate to execute a task in order to achieve a high level of reliability and availability. A common approach to guarantee an acceptable level of fault tolerance in scientific computing is to use checkpointing. When a task fails it can be restarted from its most recently checkpointed state rather than from the beginning, which reduces the system loss and ensures reliability (Bakhta and Ghalem, 2014).

1.2 Motivation

The ability to checkpoint a running application and restart it later can provide many useful benefits like fault recovery, advanced resource sharing, dynamic load balancing and improved service availability. A fault-tolerant service is essential to satisfy QoS requirements in grid computing. However, excessive checkpointing results in performance degradation. Thus there is the need to improve the performance by reducing the number of times that checkpointing is invoked. The research on Grid computing can be helpful and applicable to some industries that have successfully adopted grid computing technology such as:


1.      The financial services industry uses it for derivatives analysis, statistical analysts and portfolio risk analysis.

2.      The insurance industry uses it for certain tasks.

3.      Life sciences use grid technology to carry out cancer research and protein sequencing and folding (Hossein, 2014)

1.3 Research Problem

Fault tolerance is the technique to give the required services in the presence of fault or error within the system. The aim is to avoid failures in the presence of faults and provide services as per requirement. In fault tolerance, the fault is detected first and recovers them without participation of any external agents. The main issue in fault tolerance is how, where, and which technique is used to tolerate fault in distributed systems(Bashir and Kumar, 2013).The checkpoint is one of the most popular techniques to provide fault-tolerance on unreliable systems. It is a record of the snapshot of the entire system state in order to restart the application after the occurrence of some failure. The checkpoint can be stored on temporary as well as stable storage. However, the efficiency of the mechanism is strongly dependent on the length of the checkpointing interval. Frequent checkpointing may worsen the overhead. It is necessary to reduce the number of checkpointing occurrences to an optimal number in order to minimise the overhead.

1.4 Research Aim and Objectives

The aim of this research is to develop an enhanced checkpointing-based fault-tolerance system by reducing its runtime overhead using programmer-level checkpointing controls.

The specific objectives of the research are:


1.      Develop a mechanism that provides users the flexibility and control to insert checkpointing code as desired.

2.      Provide space efficiency by saving only the data necessary to recover an application.

3.      Implement and simulate the enhanced checkpointing algorithm using Grid Sim Toolkit – version 5.2.

4.      Evaluate performance of the enhanced checkpointing algorithm side-by-side that ofIdris(2015)

1.5 Research Methodology

The following procedures are to be adopted for this research work:

1.      Specify the number of checkpoints prior to the execution of the user’s job which can be determined based on the number of failure rate, response time tendency of failures of the failed resource. This will be decided dynamically and programmatically.

2.       Insert the checkpoints after the job has been scheduled. The number of checkpoints specified will be evenly distributed. The distribution of the checkpoints (checkpoint interval) will be implemented based on the number of checkpoints and response time of the failed resource.

, n > 0, where n is the number of checkpoints , rtime is the response time


and the ratio is the checkpoints interval

3.      Save the state of the process after every checkpoint to assure coherence and replicate the checkpoint files at many nodes to ensure their availability. Purge the previous checkpoint data and leave the last checkpoint (restore point) for recovery in

case of fault.


You either get what you want or your money back. T&C Apply

You can find more project topics easily, just search

Quick Project Topic Search