By: Jeff Levy

 
 

Multithreading an SSIS Package – Introduction

Ever have a single task in SSIS that can be applied to dozens of tables in your database? Such a task could be as simple as extracting a table out to a flat file and moving it to a network drive. In order to accomplish this task, you could create dozens of packages – one for each table – with their own Control Flow / Data Flow tasks. This can be highly inefficient if the column structure changes for a select group of tables or if the scope of tables begins to expand rapidly.

I’m going to show you a solution that consists of two SSIS packages (a template and controller package) that can dynamically accomplish the extraction of dozens of tables out to a flat file using multithreading in C#. The template package, ‘ExtractTable.dtsx’, is responsible for performing the extraction of data (via bcp queryout) to a flat file. The controller package, ‘ThreadCreator.dtsx’, is responsible for replicating the template package several times and executing them in parallel during runtime. These replicated packages will only be replicated in memory and will not be physically created as a result of the process. The diagram below depicts the process at a high level.

As far as deployment goes, I will show you how to execute these packages from your local directory and also from the Integration Service Catalog. When executing from the Integration Services Catalog, I will follow a similar approach that my colleague and fellow Key2 Consulting employee Matt Wollner articulated in his blog post, How to Execute an SSIS Package within an SSIS Script Task.

 

Step 1: Setup the ThreadQueue table

The first step in the process is to create and populate the ThreadQueue configuration table. This table is essentially the list of all database tables that you wish to be extracted out to flat file. For this article, I will be sourcing the list of tables from the AdventureWorks Database.

Please see the column structure in the picture below. It is important to note that the ‘ExtractStatus’ and ‘ThreadInstanceNumber’ columns are to be initially left NULL. These columns will be populated and updated as the packages are running.


 

Step 2: Create the Template Package (‘ExtractTable.dtsx’)

2.1 – Create the package
After the ThreadQueue configuration table has been created and populated, create a new SSIS project in Visual Studio called ‘Multithreading’. Within the project, make sure to create a new package called ‘ExtractTable.dtsx’. This package will be used as the ‘Template’ package and will be replicated n times at execution time (‘n’ represents the number of threads you choose your solution to have). Each instance of the ‘ExtractTable.dtsx’ package will be responsible for extracting a different table from the ThreadQueue configuration table. This package consists of 4 Control Flow Tasks and is illustrated in the picture below.

2.2 – Create the Package Parameters
Create a new Package Parameter in the ‘ExtractTable.dtsx’ package called ‘ThreadInstanceNumber’. For now, give this parameter a default value of ‘1’. This value will be set by the ‘ThreadCreator’ Package at runtime. I will discuss how this package works later on in this article.

2.3 – Create the Package Variables
Below is a list of Package Variables that this package will need.

• The ‘BaseTableName’, ‘FullTableName’, and ‘TableID’ variables will be populated by Task 1. Task 1 grabs the values for these variables from the ThreadQueue configuration table. If the values are set to ‘-1’, then there are no more database tables left to be extracted in the ThreadQueue configuration table.

• The ‘EvalExpressionNull’ variable is used exclusively for the For Loop Container. This variable is set by an SSIS Expression. Please note the syntax in the picture above.

• The ‘bcpArguments’ and ‘FileName’ variables are used as input for Task 2: BCP Extract Process. This task is the heart of the package, as it actually extracts the database table out to a flat file.

2.4 – Create Task 1: Get First Extract Table
This task is responsible for getting the first available database table in your ThreadQueue configuration table. It will select the first item in the ThreadQueue configuration table with a NULL value in the ExtractStatus column. After a database table is chosen, this task will mark this entry as ‘Pending’ in the ExtractStatus column.

Here is the SQL code below. Make sure to use the locks/transaction blocks to prevent any race conditions from occurring.

The parameters are set up as shown below.

Parameter Mapping / Input:

Result Set / Output:

2.5 – Create the For Loop Container
This container exists in order to iterate through all items in the ThreadQueue configuration table. You will need to configure the Loop as indicated in the screenshot below.

2.6 – Create Task 2: BCP Extract Process
This task performs the primary function of the package, which is to extract data from the list of tables in the ThreadQueue configuration table out to flat file. This task is utilizing an Execute Process task from the SSIS toolbox and running a BCP queryout command under the covers. If you are unfamiliar with the BCP process, please check out the following link: https://docs.microsoft.com/en-us/sql/tools/bcp-utility.

You will need to perform two steps for this task.

1. Select the ‘Executable’ value – You will need to search for the bcp.exe file on your machine and map the location accordingly as per the picture below.

2. Select the ‘Arguments’ values – In order to set this value, you will need to utilize an SSIS Expression and map it to the ‘bcpArguments’ variable. Please refer to the screenshot below for guidance.

2.7 – Create Task 3: Update Record to Success
Upon successful extraction of the flat file to a specified directory, the ThreadQueue configuration table will be updated to ‘Success’. The ThreadQueue configuration table will also be updated with the current ThreadInstanceNumber that is processing the current table.

The code:

Note: The transaction blocks exist to ensure proper updates and to avoid thread collisions / race conditions.

Parameter Mapping / Input:

2.8 – Create Task 4: Get Next Extract Table
This task is essentially the same as Task 1, except that it occurs later in the process. The task’s purpose is to fetch the next table to be extracted out to flat file. The SQL code, parameter inputs, and result set configurations are the exact same as Task 1.
 

Step 3: Setup the ThreadCreator Package

The ThreadCreator package has 1 Script Task that is written completely in C#. Before we review the C# code, please add the following parameters to this package.

3.1 – Create Package Parameters

• ISC_ExecutionServerName – this is the server where the packages will run from the Integration Services Catalog

• ISC_PackageFolderName – this is the name of the folder in which your Integration Services Project will be deployed to

• ISC_PackageProjectName – this is the name of the project that your packages belong to in Visual Studio

• IsExecutedFromSSISCatalog – this will tell the controller package to run from the Integration Services Catalog (if true) or to execute from a local directory (if false)

• NumThreads – the number of template packages to be running in parallel

• PackageDirectory – the location of the template package on disk. This parameter will not be used if the execution is from the integration services catalog.

• PackageName – this is the name of the Template Package to be replicated and executed in parallel.

Be sure to add these parameters as Read Only in the SSIS Script Task editor.

3.2 – Add Assembly references
Open up the Script task editor and add the reference assemblies listed in the picture below. You should be able to find most of them in the ‘C:\Windows\assembly\GAC_MSIL’ directory of your local machine. Make sure to choose the proper version. In my case, they were located in the 13.0.0.0 folder of each assembly.

Add C# Code
Within the script task editor, add the following classes in the Solution Explorer. Note: you should not copy over the name of my namespace, SSIS should generate one unique for you.

• ScriptMain.cs

• DWPackage.cs

• LocalPackage.cs

• ICSPackage.cs

 

Step 4: Execute the Package

After all the C# code has been brought into the Script Task of the ‘ThreadCreator.dtsx’ package, it is time to execute! Refer to one or both of the sections below for the type of execution that you wish your solution to perform.

4.1 – Execute from a local directory
Open up the package parameters inside the ‘ThreadCreator.dtsx’ package and make sure the following parameters are set:

• ‘IsExecutedFromSSISCatalog’ is set to False. This will insure that the package will run from a local directory.

• NumThreads is set to the number of concurrent extract processes. This is the number of template packages running at the same time. In my case I chose 3, however you may choose as many threads as you think your system can handle. This may be a trial and error process.

• ‘PackageDirectory’ is set to the path of your ‘ExtractTable.dtsx’ package on your local directory.

• ‘PackageName’ is set to the name of the template package. In this case, it is ‘ExtractTable.dtsx’.

Note: The other ‘ISC_XXXX’ parameters are not used for execution from a local directory. You will use them if you choose to execute from the Integration Services Catalog.

After confirming those values, double check your ThreadQueue configuration table. It should have null values for the last two columns.

Now you can finally hit the magic execute button and watch the amazing power of multithreading!

Notice the updated values of ‘ExtractStatus’ and ‘ThreadInstanceNumber’ column of the ThreadQueue configuration table. The values of the ‘ExtractStatus’ column should indicate ‘Success’ and the value of the ‘ThreadInstanceNumber’ column should indicate the thread instance number responsible for extracting the corresponding database table.

4.2 – Execute from the Integration Services Catalog

The first step in executing an SSIS project on the Integration Services Catalog is to deploy your solution to a SQL Server instance. I chose to deploy my project by right clicking on the project name in Visual Studio and selecting ‘Deploy’.

After deploying your project, make sure to note the Package Folder Name and Package Project Name. You will need to provide these values as execution parameters at runtime.

Before execution, again double check your ThreadQueue configuration table. It should have null values for the last two columns.

To execute from the Integration Services Catalog, right click the ‘ThreadCreator.dtsx’ package under the Integration Services Catalog on your SQL Server instance and select ‘Execute’.

A window should popup with a list of parameters. Make sure the following parameters are all filled out per the picture below:

• ‘IsExecutedFromSSISCatalog’ is set to ‘True’. This will insure that the package will run from the Integration Services Catalog. You must select this value as ‘True’ or the process will not run from the Catalog.

• ‘NumThreads’ is set to the number of concurrent extract processes. This is essentially the number of template packages all running at the same time. In my case I chose 3, however you may choose as many threads as you think your system can handle. This may be a trial and error process.

• ‘ISC_PackageServerName’ is set to the server name of the SQL Server instance running the packages from the Integration Services Catalog.

• ‘ISC_PackageFolderName’ is set to the name of the folder in which you deployed your solution. You can obtain this from the screenshot provided earlier within step 4.2.

• ‘ISC_PackageProjectName’ is set to the name of the project that contains your SSIS packages. You can obtain this from the screenshot provided earlier within step 4.2.

• ‘PackageName’ is set to the name of the template package. In this case, it is ‘ExtractTable.dtsx’.

Note: The ‘PackageDirectory’ parameter is not used for execution from the Integration Services Catalog. It is used only when executing from a local directory.

After selecting the ‘OK’ button, the execution process will start. Open up the ‘All Executions’ report from the Integration Services Catalog and you will notice 4 concurrent executions of packages. 1 instance of the ‘ThreadCreator.dtsx’ package and 3 instances (or number of threads) of the ‘ExtractTable.dtsx’ package.

Each instance of the ‘ExtractTable.dtsx’ is extracting a different table (via bcp queryout) from the ThreadQueue configuration table.

After completion, check the ThreadQueue configuration table and notice which ThreadInstanceNumber processed each table.

And finally, checking our output directory…the files have been created! Success!


 

Thank you for reading. Be sure to subscribe!

Thank you for reading. As always, we encourage you to share any thoughts, questions, or ideas you may have regarding this blog post.

Also, if you haven’t subscribed to our email mailing list yet, be sure to do so for Key2 Consulting content and updates!


 

References

The development approach discussed in this blog follows a similar structure to that found in the MSSQL Tips.com article here. The content within this blog expands on this concept by introducing a new business task (Multithreaded Export), an enhanced Threading Process (ThreadQueue Table), and a different method for package execution (SQL Server Integration Services Catalog).

 
 


Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications. 
Share34
Share16
+1
Tweet
50 Shares