Microsoft SQL Server 2008: Checkpoints
Excerpt by Don Kiely
As you learn about Integration Services, you'll be able to tackle larger and more complex ETL projects, with dozens or even hundreds of tasks moving data among many data stores and performing multiple transformations on it along the way. You may also have individual tasks that take hours to run because of the volume of data they have to move and process, or because of slow resources such as network speeds. A package might fail after many tasks completed successfully, or after the one hours-long task completes.
If you have to re-run the entire package after fixing the problem, you'll again have to patiently wait for hours while earlier tasks duplicate their work before you even get to the point where the package failed on the first run. That can be a painful experience, and the local database administrator will likely not be pleased that you're taking up so many resources for so long, repeatedly. To get around these kinds of problems, Integration Services packages are restartable using a feature called checkpoints. When you implement checkpoints, the package creates a checkpoint file that tracks the execution of the package. As each task completes, Integration Services writes state information to the file, including the current values of variables that are in scope.
If the package completes without errors, Integration Services deletes the checkpoint file. If the package fails, the file contains complete information about which tasks completed and which failed, as well as a reference to where the error occurred. After you fix the error, you execute the package again and the package restarts at the point of failure-not at the beginning-with the same state it had at failure. Checkpoints are an incredibly useful feature, especially in long-running packages. Checkpoints are not enabled on a package by default. You have to set three package-level properties to configure checkpoints:
- CheckpointFilename: Specifies the name and location of the checkpoint file name. You must set this property, but the name can be any valid Windows filename, with any extension.
- CheckpointUsage: Determines how the package uses the checkpoint file while the package executes. It has three settings:
- Always: The package will always use the checkpoint file and will fail if the file does not exist.
- IfExists: The package will use the checkpoint file if it exists to restart the package at the previous point of failure. Otherwise, execution begins at the first Control Flow task. This is the usual setting for using checkpoints.
- Never: The package will not use the checkpoint file even if it exists. This means that the package will never restart, and will only execute from the beginning.
- SaveCheckpoints: Specifies whether the package should write checkpoints to the file.
This combination of properties provides flexibility in configuring checkpoints for the package, then turning its use on and off before execution without losing the checkpoint configuration. In order for checkpoints to work, a task failure has to cause the package to fail. Otherwise, the package will continue executing beyond the failure, recording more data in the checkpoint file for subsequent tasks.
So you must also set the FailPackageOnFailure property to true for each task where you want to make it possible to restart the package using a checkpoint. If it is set to false for a task and the task fails, Integration Services doesn't write any data to the checkpoint file. Because the checkpoint data is incomplete, the next time you execute the package it will start from the beginning.
TIP: Checkpoints only record data for Control Flow tasks. This includes a Data Flow task, but it does not save checkpoint data for individual steps in a Data Flow. Therefore a package can restart at a Data Flow task, but not within a Data Flow itself. In other words, you cannot restart a package using a checkpoint to execute only part of a Data Flow, just the entire Data Flow.
At the start of the package, Integration Services checks for the existence of the checkpoint file. If the file exists, Integration Services scans the contents of the checkpoint file to determine the starting point in the package. Integration Services writes to the checkpoint file while the package executes. The contents of the checkpoint file are stored as XML and include the following information:
- Package ID: A GUID stamped onto the file at the beginning of the execution phase.
- Execution Results: A log of each task that executes successfully in order of execution. Based on these results, Integration Services knows where to begin executing the package the next time.
- Variable Values: Integration Services saves package variables' values in the checkpoint file. When execution begins again, the checkpoint file's variable values are read from the checkpoint file and then set on the package.
This post is an excerpt from the online courseware for our Microsoft SQL Server 2008 Integration Services course written by expert Don Kiely.