By: Matt Wollner
Adding custom logging and auditing to your ETL process can be very beneficial – so much so that if your organization hasn’t added it to your ETL process yet, we highly recommend doing so.
Many developers rely on SSIS to log their SSIS packages via events. These logs can help troubleshoot when a package fails or can tell you the last time a package was run. But these logs don’t provide important insight into how or when specific rows of data got loaded, nor from where. This is where the value of custom logging and auditing really comes in to play.
The Benefits of Custom Logging and Auditing
Here are 3 benefits custom logging and auditing can bring to your ETL process.
1. Help you provide regulatory compliance, like Sarbanes-Oxley compliance
2. Provide a deep understanding of database activity and additional insight into data anomalies that may occur
3. Can help answer important questions like, “When was that row last updated?”
Custom logging and auditing can also help you identify specific data for targeted rollbacks. For example, what would you do if you were told that there was an error in your source data for Monday only? Would your current ETL processes enable you to remove all data loaded on Monday only?
We include a custom logging and auditing framework in every SSIS solution that we develop for companies. As a result, every single time a package is executed, we capture the execution of the package, every SSIS task that runs, and any errors that may have occurred. All of the logging is stored in a set of SQL tables located in the desired database. The main logging table, ‘Log.ExecutionLog’, stores a row for every package that is executed. We also enable the ‘OnVariableValueChange’, which captures selected variables every time they change.
For auditing, we recommend using the primary key of ‘Log.ExecutionLog’, ‘ExecutionLogID’, and add an ‘InsertAuditKey’ and ‘UpdateAuditKey’ to every affected table. This will allow you to pinpoint the SSIS package and execution time for every row in a table.
So what does all of this look like?
You should have a base table, ‘Log.ExecutionLog’, that will store 1 row for every time an SSIS package is run. The primary key in this table, ‘ExecutionLogID’, will be used as the Audit Key in every table we are auditing. This way every time a row is added or updated, it can be traced back to the SSIS package that affected the row.
You should then create a table for every SSIS event handler that you’ve set up in your SSIS package. We recommend that your template uses these event handlers:
- • OnError – every error is written to the ‘ExecutionErrorLog’. There may be more than 1 error per execution.
- • OnPreExecute / OnPostExecute – this is an event for every control flow task that is run. We store this in the ‘ExecutionTaskLog’.
- • OnVariableValueChanged – all variables whose Raise Change Event = True will get a row added to ‘ExecutionVariableLog’ every time their values change.
- • An Execute SQL Task is added at the beginning and end of each package: ‘OnBeginEvent’ and ‘OnEndEvent’.
SSIS Package Logging
We recommend beginning your logging by creating a template SSIS package that will be used as a starting point for all new packages. Each task you add should use an Execute SQL Task to call a stored procedure that’s specific to the given task. We recommend including this below in your template.
• ‘OnBeginEvent’ will add a row to the ‘ExecutionLog’ table and will add the package start time along with other package attributes. The stored procedure will return the ‘ExecutionLogID’ (which was just inserted) and stores it in a package variable using the Output parameter. The ‘ExecutionLogID’ value is passed to every other event that is used.
• ‘OnEndEvent’ will update the ‘ExecutionLog’ with an end time and a success status. If any tasks fail, the SQL ‘HangleOnEndEvent’ will not get called. The ‘OnErrorEvent’ event handler will update the status for us.
• ‘ParentExecutionLogID’ – if a package calls a child package, it will pass its ‘ExecutionLogID’ variable to the child package. The child package will capture the calling package’s ‘ExecutionLogID’ as the ‘ParentExecutionLogID’. Doing so enables the entire chain of packages to be traced back to the originating package.
• Event Handlers
• ‘OnErrorEvent’ – the ‘OnErrorEvent’ is triggered each time there is an error in the package. We recommend having a SQL task (in the event handler) that calls a stored procedure to add the rows to ‘ExecutionErrorLog’. We also recommend adding a Send Email Task to send a failure email.
• ‘OnPreExecute’ – The ‘OnPreExecute’ will get called every time a control flow task begins. It will add a row to the ‘ExecutionTaskLog’.
• Keep in mind that the OnPreExecute and OnPostExecute will fire for every control flow task in the package. If you have an ‘OnPostExecute’ in a parent package, it will fire for every control flow task in the child package as well as for every event handler in the child package. This can explode the amount of logging and slow down your package. You can set the ‘Propagate’ in the event handler to ‘False’ to avoid this. You can see my previous blog post, “Event Handlers in Child Packages (SSIS),” for more details.
• ‘OnPostExecute’ – each time a control flow task completes, ‘OnPostExecute’ is triggered. It will update the row added by the ‘OnPreExecute’ event.
• ‘OnVariableValueChanged’ – every variable in an SSIS package has a property called ‘Raise Change Event’. If the Raise Change Event flag is set to true, it will fire the ‘OnVariableValueChanged’. We use the ‘ExecutionVariableLog’ table to store a row every time these variable values’ change. Keep in mind that this only triggers when the value changes. The OnVariableValueChanged event is not triggered if the value is set to the default value.
Now that we have our logging set up, we can use the ‘ExecutionLogID’ variable in each record that we affect with the SSIS packages.
One of the most important goals of auditing is to be able to determine where and when every row in your tables was touched. We recommend adding the ‘ExecutionLogID’ to a row as well as the ‘InsertAuditKey’ or ‘UpdateAuditKey’ when inserting and updating new tables with a data flow.
• If you are modifying data in a stored procedure, you should pass the stored procedure through the ‘ExecutionLogID’.
• Another audit that we recommend adding to your packages is the number of rows that were pulled from the source and how many rows were inserted and updated in a given data flow. This can help you verify that the data flow is pulling and updating the data that you are expecting it to.
To add a row count audit, we add a ‘Row Count Task’ after each source and before each ‘Destination’. The variable that is assigned needs to have the ‘Raise Change Event Flag’ set to ‘True’, and the ‘OnVariableValueChanged’ event will log the variable value change.
Do you need assistance adding custom logging and auditing to your ETL process?
We can help you incorporate a custom logging and auditing framework into your SSIS solution to improve your ETL process. Give us a call at (678) 835-8539 or email us at [email protected] and we will get back to you shortly!
Keep your data analytics sharp by subscribing to our mailing list!
Thanks for reading! Keep your knowledge sharp by subscribing to our mailing list to receive fresh Key2 content around data analytics, business intelligence, data warehousing, and more!
Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.