Ken Adams, Senior Business Intelligence Consultant at Key2 Consulting
By: Ken Adams

 

Introduction

Disclaimer: Microsoft introduced the project deployment model for Integration Services (SSIS) over 6 years ago with SQL Server 2012. At this point in time, it is likely that you know a lot about project deployment. This post will assume that you do.

Here at Key2, we have a large client whom we manage the ETL operations for. Millions of rows of data are ingested, wrangled, and written to numerous destinations (on a daily basis) to support thousands of users. Our client’s ETL system is mature and very stable. Their system also happens to use the legacy model of SSIS package deployment. Over the last couple of years, we’ve had many discussions about upgrading the packages – which number over 700 – to project deployment. We eventually decided to make the jump for numerous reasons, including:

  • The possibility that Microsoft could pull support for package deployment
  • Additional flexibility for execution of packages
  • New features being released for project and not package deployment

Microsoft has a wizard for upgrading projects from package deployment to project deployment. The wizard actually does a nice job of detecting configurations and migrating those to package and project level parameters. The primary issue we encountered was the desire to convert the package connection managers to project connection managers. This can be clunky, but I’ll illustrate the steps for you. I’ll also share how we used Powershell and SQL Server XQuery to cut the time to complete the upgrade from weeks to hours.
 

Requirements

For this post, I’m using the World Wide Importers data available on GitHub. Specifically, I’m using the full backup for the OLTP and OLAP databases, along with their ETL project.

If you need any help installing and using these data sets, please check out the following article.
 

The Upgrade Process

To start, I took the SSIS solution and performed the following steps:

  1. Split the single package into a package per table being loaded
  2. Added a master package for calling the individual packages
  3. Downgraded the solution from project deployment to package deployment

Here’s what I ended up with:


 
Next let’s upgrade the solution to project deployment. Execute the “Convert to Project Deployment” wizard and follow all the steps. This proved to be straightforward and the defaults were accepted for each screen in the wizard. As I noted above, a key component of the conversion was creating project connection managers rather than the package connection managers. Ideally, this would be done via the right-click menu.


 
To some extent, this works. However, it’s pretty painful. The right-click menu option updates the connection managers and promotes them to project connection managers, but it only does so for the opened package.


 
*Notice in the below screenshot how when you open another package, the other connection managers remained in the same state.


 
See how the project connection managers do not show up! This is maddening. The workaround, albeit clunky, is to rename the connection managers.


 
Adding “1” as a suffix – shown above – is like magic, and now the project level connection managers appear. The trouble with this route is that the various tasks are still tied to the now renamed package connections.


 
Therefore, this method would require opening up each task with a connection, pointing that task to the project connection, and then removing the package connection now suffixed with the “1”. For a handful of packages, this would be fine. As you can imagine for 700+ packages, this was a no-go. PowerShell and XQuery to the rescue!

First, let’s reset our project to its original state. Look familiar?


 
I set about the conversions again:

  1. Step through the Convert to Project Deployment wizard

  2. Open a package and add 2 new connection managers – WideWorldImportersDW.OLEDB and WideWorldImporters.OLEDB

  3.  

     

  4. Promote the 2 new connection managers to project connection managers


 
Now our project looks like this:


 
You’ll note that we have 2 project level connection managers. It’s important to point out that they are not connected to anything, while the individual packages are still connected to the package connections as we saw in the previous example. The solution was using a combination of PowerShell and XQuery to map these new project connection managers to the individual tasks. XQuery’s job was to query the XML from each package and ferret out the individual connection manager GUIDs. As you see below, each package connection has a different GUID value.

City Dimension Package
 

 
Customer Dimension Package
 

 
However, the project level connections will have the same GUID value across packages. So, if we can replace the package connection GUIDs with the new project connection GUIDs, the tasks within those packages should be updated. The replacement of the GUID values has to be done inside of the XML. Theoretically, you could open up each DTSX file in a text editor to perform the find and replace operation. But again, while that would be acceptable for a handful of packages, it wouldn’t work on a large scale. This is where PowerShell came in.

First, find the GUID value of the new project connection manager. In particular, we are looking for the WideWorldImportersDW.OLEDB connection, which is the destination connection used in the various ExecuteSQL tasks. This value will be used for the @NewDTSID value in the query to follow.


 
Now open up a new query window on the local SQL Server instance and run the query below. You’ll need to modify anything commented with TOUCH POINT to fit your environment. Also, make sure that “xp_cmdshell” is enabled prior to executing this logic.

The resulting dataset from this query contains 2 columns. The 1st column is the name of the package and the 2nd is the “Find and Replace” values required by the PowerShell. It’ll look something like this.

Next, we’ll execute a PowerShell script that utilizes a lookup list to perform the “find and replace” in the package XML. The PowerShell script is below:

This script is expecting a file named LookUp.csv that contains a pipe-delimited list. The 1st position is the old value and the 2nd position is the replacement value. The parts of the file are shown below.

• Header

 
• Named connection replacements – these represent the data flow tasks connections. Replace the OldValue with the package connection value and the NewValue with the project connection value (GUID or name).

 
• Individual package GUID replacements which represent the Execute SQL tasks – these can be copied and pasted from the “ConnectionID” column resulting from the prior XQuery statement.

You can find the full file here on our GitHub page.
 
Now that the look-up file is ready, the PowerShell can be executed. Open up the PowerShell ISE and execute the following commands. (Note: These should reflect your local folders and must reference the location of the LookUp.csv file and folder containing the SSIS packages.)

PowerShell Script:
cd “C:\Users\Key2\Documents\Blog\SSIS Project Deployment Conversion”

.\Replace-InFiles-WithLookup.ps1 -List “.\LookupFile.csv” -Files “.\WWIDailyETL\*.dtsx” — Files stored here at GitHub.

PowerShell is very quick at performing file operations. For a handful of packages, this will take no more than a second or two. For a large number of packages, say 450, it might take up to an hour. This might seem like a long time, but consider this. This script is interrogating thousands of lines of XML and looking for the specific values found in the look-up file. If it finds the “OldValue”, it then overwrites it with the “ReplacementValue”.

In the end, if your look-up file is well constructed, then you can execute the script with confidence. Go grab some lunch and return to find that your hundreds of packages are now pointed to the new project connection managers.

See for yourself.

Execute SQL Task

 
Data Flow Task

 
These are 2 example packages. Notice the tasks within those packages are now pointing to project level connection managers. Doing this work by hand would have taken 2-3 people a week or more to complete. And even then, you open up your client to human error in the form of fat fingering or just plain oversight on the developers’ part. This solution, while not fully automated, makes good use of readily available technologies (SQL Server and PowerShell) and scales quite well. I can attest that we converted 700+ packages without error and in a very reasonable period. We’re talking about hours instead of weeks.

This look-up list concept can also be extended to handle other conversions. In our case, we had query statements that were part of our SSIS framework and were contained inside of each package that we didn’t want to execute anymore. The following addition to the LookUp.csv file handled this scenario.


 
We simply commented it out in the Execute SQL tasks. The alternative was to have the team open up each package and either remove or comment out this statement. Again, you’re opening your client up to human error. And you might just drive your development team insane.
 

Conclusion

As I stated, the SSIS project deployment conversion wizard does a nice job of determining existing configurations in your package deployment solutions and promoting those to package or project parameters. The real shortcoming, in my opinion, is the elevation of package connection managers to project connection managers, and then propagating those project connections out to all of the packages. This is especially evident when you have large SSIS projects.
 
Have you hit something like this in your organization? If so, what has been your experience and how have you handled it? Please let us know if you want to discuss this. We’d be happy to assist your company in your SSIS conversion endeavors.
 

You can access all of this blog post’s code via our GitHub Page

You can find all of this blog post’s code at our GitHub folder here.

 

Keep your data analytics sharp by subscribing to our mailing list!

Thanks for reading! Keep your knowledge sharp by subscribing to our mailing list to receive fresh Key2 content around data analytics, business intelligence, data warehousing, and more!

 


 
 

Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.