By: Ken Adams
Introduction
Disclaimer: Microsoft introduced the project deployment model for Integration Services (SSIS) over 6 years ago with SQL Server 2012. At this point in time, it is likely that you know a lot about project deployment. This post will assume that you do.
Here at Key2, we have a large client whom we manage the ETL operations for. Millions of rows of data are ingested, wrangled, and written to numerous destinations (on a daily basis) to support thousands of users. Our client’s ETL system is mature and very stable. Their system also happens to use the legacy model of SSIS package deployment. Over the last couple of years, we’ve had many discussions about upgrading the packages – which number over 700 – to project deployment. We eventually decided to make the jump for numerous reasons, including:
- The possibility that Microsoft could pull support for package deployment
- Additional flexibility for execution of packages
- New features being released for project and not package deployment
Microsoft has a wizard for upgrading projects from package deployment to project deployment. The wizard actually does a nice job of detecting configurations and migrating those to package and project level parameters. The primary issue we encountered was the desire to convert the package connection managers to project connection managers. This can be clunky, but I’ll illustrate the steps for you. I’ll also share how we used Powershell and SQL Server XQuery to cut the time to complete the upgrade from weeks to hours.
Requirements
For this post, I’m using the World Wide Importers data available on GitHub. Specifically, I’m using the full backup for the OLTP and OLAP databases, along with their ETL project.
If you need any help installing and using these data sets, please check out the following article.
The Upgrade Process
To start, I took the SSIS solution and performed the following steps:
- Split the single package into a package per table being loaded
- Added a master package for calling the individual packages
- Downgraded the solution from project deployment to package deployment
Here’s what I ended up with:
Next let’s upgrade the solution to project deployment. Execute the “Convert to Project Deployment” wizard and follow all the steps. This proved to be straightforward and the defaults were accepted for each screen in the wizard. As I noted above, a key component of the conversion was creating project connection managers rather than the package connection managers. Ideally, this would be done via the right-click menu.
To some extent, this works. However, it’s pretty painful. The right-click menu option updates the connection managers and promotes them to project connection managers, but it only does so for the opened package.
*Notice in the below screenshot how when you open another package, the other connection managers remained in the same state.
See how the project connection managers do not show up! This is maddening. The workaround, albeit clunky, is to rename the connection managers.
Adding “1” as a suffix – shown above – is like magic, and now the project level connection managers appear. The trouble with this route is that the various tasks are still tied to the now renamed package connections.
Therefore, this method would require opening up each task with a connection, pointing that task to the project connection, and then removing the package connection now suffixed with the “1”. For a handful of packages, this would be fine. As you can imagine for 700+ packages, this was a no-go. PowerShell and XQuery to the rescue!
First, let’s reset our project to its original state. Look familiar?
I set about the conversions again:
- Step through the Convert to Project Deployment wizard
- Open a package and add 2 new connection managers – WideWorldImportersDW.OLEDB and WideWorldImporters.OLEDB
- Promote the 2 new connection managers to project connection managers
Now our project looks like this:
You’ll note that we have 2 project level connection managers. It’s important to point out that they are not connected to anything, while the individual packages are still connected to the package connections as we saw in the previous example. The solution was using a combination of PowerShell and XQuery to map these new project connection managers to the individual tasks. XQuery’s job was to query the XML from each package and ferret out the individual connection manager GUIDs. As you see below, each package connection has a different GUID value.
City Dimension Package
Customer Dimension Package
However, the project level connections will have the same GUID value across packages. So, if we can replace the package connection GUIDs with the new project connection GUIDs, the tasks within those packages should be updated. The replacement of the GUID values has to be done inside of the XML. Theoretically, you could open up each DTSX file in a text editor to perform the find and replace operation. But again, while that would be acceptable for a handful of packages, it wouldn’t work on a large scale. This is where PowerShell came in.
First, find the GUID value of the new project connection manager. In particular, we are looking for the WideWorldImportersDW.OLEDB connection, which is the destination connection used in the various ExecuteSQL tasks. This value will be used for the @NewDTSID value in the query to follow.
Now open up a new query window on the local SQL Server instance and run the query below. You’ll need to modify anything commented with TOUCH POINT to fit your environment. Also, make sure that “xp_cmdshell” is enabled prior to executing this logic.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
--TOUCH POINT - The variable @NewDTSID is the new project connection GUID DECLARE @Path VARCHAR(2000) ,@NewDTSID nvarchar(38) = '{42F53DC6-3078-48C6-9B38-8E2E88CA06E3}'; --TOUCH POINT - This is the local path where the packages reside SET @Path = 'C:\Users\Key2\Documents\Blog\SSIS Project Deployment Conversion\WWIDailyETL\*.dtsx'; DECLARE @MyFiles TABLE ( MyID INT IDENTITY(1, 1) PRIMARY KEY ,FullPath VARCHAR(2000) ); DECLARE @CommandLine VARCHAR(4000); SELECT @CommandLine = LEFT('dir "' + @Path + '" /A-D /B /S ', 4000); INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine; DELETE FROM @MyFiles WHERE FullPath IS NULL OR FullPath = 'File Not Found' OR FullPath = 'The system cannot find the path specified.' OR FullPath = 'The system cannot find the file specified.'; --A physical table is created to store the package XML - this worked better on the large scale project DROP TABLE IF EXISTS dbo.PackageXML; CREATE TABLE dbo.PackageXML ( PackagePath VARCHAR(900) NOT NULL PRIMARY KEY ,PackageXML XML NOT NULL ); DECLARE @FullPath VARCHAR(2000); --Declare a cursor and write our each packages XML to the PackageXML table DECLARE file_cursor CURSOR FOR SELECT FullPath FROM @MyFiles; OPEN file_cursor FETCH NEXT FROM file_cursor INTO @FullPath; WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @sql NVARCHAR(max); SET @sql = ' INSERT PackageXML (PackagePath,PackageXML) select ''@FullPath'' as PackagePath , cast(BulkColumn as varchar(max)) as PackageXML from openrowset(bulk ''@FullPath'', single_blob) as pkgColumn'; SELECT @sql = REPLACE(@sql, '@FullPath', @FullPath); EXEC sp_executesql @sql; FETCH NEXT FROM file_cursor INTO @FullPath; END CLOSE file_cursor; DEALLOCATE file_cursor; --Navigate the nodes to find the correct package connection GUIDs ;WITH XMLNAMESPACES ('www.microsoft.com/SqlServer/Dts'AS DTS) SELECT SUBSTRING(t.PackagePath,LEN(t.PackagePath) - CHARINDEX('\',REVERSE(t.PackagePath),0)+2,LEN(t.PackagePath)) AS PackageName, CONCAT('SQLTask:Connection="', CM1.cm1.value('@DTS:DTSID', 'nvarchar(38)'), '"', '|SQLTask:Connection="' + @NewDTSID + '"') AS ConnectionID FROM PackageXML t CROSS APPLY t.PackageXML.nodes('/DTS:Executable/DTS:Property') AS P(p) CROSS APPLY PackageXML.nodes('/DTS:Executable/DTS:ConnectionManagers/DTS:ConnectionManager') CM1(cm1) CROSS APPLY CM1.cm1.nodes('DTS:ObjectData/DTS:ConnectionManager') AS CM2(cm2) WHERE CM1.cm1.value('@DTS:refId', 'nvarchar(50)') = 'Package.ConnectionManagers[WWI_DW_Destination_DB]' --TOUCH POINT - filter on desired connection manager ORDER BY 1 |
The resulting dataset from this query contains 2 columns. The 1st column is the name of the package and the 2nd is the “Find and Replace” values required by the PowerShell. It’ll look something like this.
Next, we’ll execute a PowerShell script that utilizes a lookup list to perform the “find and replace” in the package XML. The PowerShell script is below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Param ( [String]$List = "LookUp.csv", [String]$Files = ".\Files\*.*" ) $ReplacementList = Import-Csv $List -Delimiter '|'; Get-ChildItem $Files -Recurse | ForEach-Object { $Content = Get-Content -Path $_.FullName; foreach ($ReplacementItem in $ReplacementList) { $Content = $Content.Replace($ReplacementItem.OldValue, $ReplacementItem.NewValue) } Set-Content -Path $_.FullName -Value $Content } |
This script is expecting a file named LookUp.csv that contains a pipe-delimited list. The 1st position is the old value and the 2nd position is the replacement value. The parts of the file are shown below.
• Header
• Named connection replacements – these represent the data flow tasks connections. Replace the OldValue with the package connection value and the NewValue with the project connection value (GUID or name).
• Individual package GUID replacements which represent the Execute SQL tasks – these can be copied and pasted from the “ConnectionID” column resulting from the prior XQuery statement.
You can find the full file here on our GitHub page.
Now that the look-up file is ready, the PowerShell can be executed. Open up the PowerShell ISE and execute the following commands. (Note: These should reflect your local folders and must reference the location of the LookUp.csv file and folder containing the SSIS packages.)
PowerShell Script:
cd “C:\Users\Key2\Documents\Blog\SSIS Project Deployment Conversion”
.\Replace-InFiles-WithLookup.ps1 -List “.\LookupFile.csv” -Files “.\WWIDailyETL\*.dtsx” — Files stored here at GitHub.
PowerShell is very quick at performing file operations. For a handful of packages, this will take no more than a second or two. For a large number of packages, say 450, it might take up to an hour. This might seem like a long time, but consider this. This script is interrogating thousands of lines of XML and looking for the specific values found in the look-up file. If it finds the “OldValue”, it then overwrites it with the “ReplacementValue”.
In the end, if your look-up file is well constructed, then you can execute the script with confidence. Go grab some lunch and return to find that your hundreds of packages are now pointed to the new project connection managers.
See for yourself.
Execute SQL Task
Data Flow Task
These are 2 example packages. Notice the tasks within those packages are now pointing to project level connection managers. Doing this work by hand would have taken 2-3 people a week or more to complete. And even then, you open up your client to human error in the form of fat fingering or just plain oversight on the developers’ part. This solution, while not fully automated, makes good use of readily available technologies (SQL Server and PowerShell) and scales quite well. I can attest that we converted 700+ packages without error and in a very reasonable period. We’re talking about hours instead of weeks.
This look-up list concept can also be extended to handle other conversions. In our case, we had query statements that were part of our SSIS framework and were contained inside of each package that we didn’t want to execute anymore. The following addition to the LookUp.csv file handled this scenario.
We simply commented it out in the Execute SQL tasks. The alternative was to have the team open up each package and either remove or comment out this statement. Again, you’re opening your client up to human error. And you might just drive your development team insane.
Conclusion
As I stated, the SSIS project deployment conversion wizard does a nice job of determining existing configurations in your package deployment solutions and promoting those to package or project parameters. The real shortcoming, in my opinion, is the elevation of package connection managers to project connection managers, and then propagating those project connections out to all of the packages. This is especially evident when you have large SSIS projects.
Have you hit something like this in your organization? If so, what has been your experience and how have you handled it? Please let us know if you want to discuss this. We’d be happy to assist your company in your SSIS conversion endeavors.
You can access all of this blog post’s code via our GitHub Page
You can find all of this blog post’s code at our GitHub folder here.
Keep your data analytics sharp by subscribing to our mailing list!
Thanks for reading! Keep your knowledge sharp by subscribing to our mailing list to receive fresh Key2 content around data analytics, business intelligence, data warehousing, and more!
Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.
Thank you so much for posting this. We have about 100 ssis packages which we are converting to project deployment model and were stuck when we tried to convert the package connections to project connections. I tried the code you have given above on a small subset and it worked beautifully. Thank you very much for your help.
Hi Kamal,
You are very welcome. We’re glad our code worked well for you and was just the solution you needed!
– Key2 Consulting Team
Thank you for sharing your experience and learnings. This helped me quickly navigate a similar difficulty I was having. Very much appreciated!
David,
Thanks for the kind words. It’s really great to be able to help folks and I’m glad you found the post. Thanks again.
-Ken