By: Phil Woubshet

This exercise will demonstrate how to copy multiple tables from an on-premise SQL Server database to Azure SQL Data Warehouse. We will use a metadata table that contains the schema/table names to drive the copy process from source to sink.

There are numerous reasons to utilize cloud computing to supplement or replace a traditional on-premise system. Our current use cases are:

Copying numerous tables between environments (Prod to Test, Prod to Sandbox, Test to Dev, etc.)
Migrating from an on-premise environment to the cloud to take advantage of flexible resources
Sharing data with different groups that do not have access to an on-premise resource

Prerequisites

On-premise SQL Server
Optional: Azure blob storage account that will be used for staging data during the copy in order to utilize Polybase
Self-hosted Integration Runtime within Data Factory (instructions to create it if you don’t already have one)
Azure Synapse Analytics (instructions to create it if you don’t already have one)

Create and Populate On-Premise SQL Server Metadata Table

Create a metadata table in your on-premise database that will drive the copy operation. For the purposes of this demo, we are using the tables [Person].[Address] and [Person].[AddressType] that came from an [AdventureWorks] database.

DROP TABLE IF EXISTS DataFactoryMeta 
CREATE TABLE DataFactoryMeta (
TableSchema VARCHAR(255)
,TableName VARCHAR(255)
,Active BIT
)
INSERT INTO DataFactoryMeta
VALUES 
('[Person]','[Address]',1)
,('[Person]','[AddressType]',1)

DROP TABLE IF EXISTS DataFactoryMeta

CREATE TABLE DataFactoryMeta (

TableSchema VARCHAR(255)

,TableName VARCHAR(255)

,Active BIT

)

INSERT INTO DataFactoryMeta

VALUES

('[Person]','[Address]',1)

,('[Person]','[AddressType]',1)

Data Factory Configuration

Create a Data Factory

Launch Microsoft Edge or Google Chrome web browser. Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.
Go to the Azure portal.
From the Azure portal menu, select Create a resource.

Select Integration, and then select Data Factory.

On the New data factory page, enter a name for your data factory.

The name of the Azure data factory must be globally unique. If you see the following error, change the name of the data factory (for example, ADFTutorialDataFactory) and try creating again. For naming rules for Data Factory artifacts, see the Data Factory – naming rules article.

For Subscription, select your Azure subscription in which you want to create the data factory.
For Resource Group, use one of the following steps:

a. Select Use existing, and select an existing resource group from the list.

b. Select Create new, and enter the name of a resource group.

For Version, select V2.
For Location, select the location for the data factory.
Select Create.
After the creation is complete, you see the Data Factory page. Select the Author & Monitor tile to start the Azure Data Factory user interface (UI) application on a separate tab.

On the Let’s get started page, switch to the Author tab in the left panel.

Create Linked Service to On-Premise SQL Server Source

Select Manage, and then select the New button on the toolbar.
On the New linked service page, select SQL Server, and then select Continue.
Fill out the required fields and be sure to choose your self-hosted integration runtime under Connect via integration runtime. Test Connection and select Create.

Edit Linked Service Azure Data Factory

Create Linked Service to Azure Synapse Analytics (Formerly SQL DW Sink)

Select Manage, and then select the New button on the toolbar.
On the New linked service page, select Azure Synapse Analytics, and then select Continue.
Fill out the required fields, Test Connection and select Create.

Edit linked service - Azure SQL Data Warehouse

Create Source Dataset

This dataset will connect to the source metadata table that contains the table names to copy.

Select the + (plus) button, and then select Dataset.

Factory Resources

On the New dataset page, select SQL Server, and then select Continue.
Under Linked Service, choose the on-prem linked service previously created, select the metadata table under Table name, and then select OK.

set properties

Create Sink Dataset

This dataset will connect to the sink Azure Synapse Analytics where the tables will be copied to.

Select the + (plus) button, and then select Dataset.

factory resources 2

On the New Dataset page, select Azure Synapse Analytics, and then select Continue.
Under Linked Service, choose the Azure Synapse Analytics linked service previously created, select the metadata table under Table name, and then select OK.

set properties 2

Switch to the Parameters tab, click + New, and enter DWTableName for the parameter name.
Switch to the Connection tab.

a. For Table, check the Edit option, click into the table name input box, then click the Add dynamic content link below.

b. In the Add Dynamic Content page, click the DWTableName under Parameters, which will automatically populate the top expression text box @dataset().DWTableName, then click Finish. The tableName property of the dataset is set to the value that’s passed as an argument for the DWTableName parameter. The ForEach activity iterates through a list of tables, and passes one by one to the Copy activity.

add dynamic content

c. Update the Connection to use the newly created parameter

connection 2

Create Pipeline to Iterate and Copy Tables

Create the IterateAndCopySQLTables pipeline which will take a list of tables as a parameter. For each table in the list, it copies data from the on-prem table in SQL Server to Azure SQL Data Warehouse using staged copy and PolyBase.

In the left pane, click + (plus), and click Pipeline.

factory resources pipeline

In the General tab, specify IterateAndCopySQLTables for name.
Switch to the Parameters tab, and do the following actions:

a. Click + New.

b. Enter tableList for the parameter name.

c. Select Array for Type.

In the Activities toolbox, expand Iteration & Conditions, and drag-drop the ForEach activity to the pipeline design surface. You can also search for activities in the Activities toolbox.

a. In the General tab at the bottom, enter IterateSQLTables for Name.

b. Switch to the Settings tab, click the input box for Items, then click the Add dynamic content link below.

c. In the Add Dynamic Content page, collapse the System Variables and Functions sections, click the tableList under Parameters, which will automatically populate the top expression text box as @pipeline().parameter.tableList. Then click Finish.

add dynamic content

d. Switch to Activities tab, click Add activity to add a child activity to the ForEach activity.

In the Activities toolbox, expand Move & Transfer, and drag-drop Copy Data activity into the pipeline designer surface. Notice the breadcrumb menu at the top. The IterateAndCopySQLTable is the pipeline name and IterateSQLTables is the ForEach activity name. The designer is in the activity scope. To switch back to the pipeline editor from the ForEach editor, you can click the link in the breadcrumb menu.

Create Pipeline to Get Table List and Trigger Copy Data pipeline

Create the GetTableListAndTriggerCopyData pipeline which will perform two actions:

Look up the on-prem SQL Server Database metadata table to get the list of tables to be copied.
Trigger the pipeline IterateAndCopySQLTables to do the actual data copy.

In the left pane, click + (plus), and click Pipeline.
In the General tab, change the name of the pipeline to GetTableListAndTriggerCopyData.
In the Activities toolbox, expand General, and drag-drop Lookup activity to the pipeline designer surface, and do the following steps:

a. Enter LookupTableList for Name.

b. Enter Retrieve the table list from Azure SQL database for Description.

Switch to the Settings tab, and do the following steps:

a. Select AzureSqlDatabaseDataset for Source Dataset.

b. Select Query for Use Query.

c. Enter the following SQL query for Query.

SELECT TableSchema as TABLE_SCHEMA, TableName AS TABLE_NAME FROM DataFactoryMeta WHERE Active = 1

1	SELECT TableSchema as TABLE_SCHEMA, TableName AS TABLE_NAME FROM DataFactoryMeta WHERE Active = 1

d. Clear the checkbox for the First row only field.

Drag-drop Execute Pipeline activity from the Activities toolbox to the pipeline designer surface, and set the name to TriggerCopy.
Switch to the Settings tab, and do the following steps:

a. Select IterateAndCopySQLTables for Invoked pipeline.

b. Expand the Advanced section.

c. Click + New in the Parameters section.

d. Enter tableList for parameter name.

e. Click VALUE input box -> select the Add dynamic content below ->

enter@activity(‘LookupTableList’).output.value as table name value -> select Finish. You’re setting the result list from the Lookup activity as an input to the second pipeline. The result list contains the list of tables whose data needs to be copied to the destination.

Iterate and Copy SQL Tables

To validate the pipeline, click Validate on the toolbar. Confirm that there are no validation errors. To close the Pipeline Validation Report, click >>.
To publish entities (datasets, pipelines, etc.) to the Data Factory service, click Publish All on top of the window. Wait until the publishing succeeds.

Trigger Pipeline

Now that the pipelines are configured, it’s time to run them.

Select GetTableListAndTriggerCopyData from the available Pipelines.
Select Add Trigger, then select Trigger Now and select Finish.

To monitor progress, select the red Monitor icon on the leftmost pane.

Azure Synapse Analytics (Formerly SQL DW) Unsupported Data Types

The following data types are unsupported in Azure Synapse Analytics:

'geography','geometry','hierarchyid','image','text','ntext','sql_variant','xml'

1	'geography','geometry','hierarchyid','image','text','ntext','sql_variant','xml'

These can be identified in your source database via the following query:

SELECT  t.[name], c.[name], c.[system_type_id], c.[user_type_id], y.[is_user_defined], y.[name]
FROM sys.tables  t
JOIN sys.columns c on t.[object_id]    = c.[object_id]
JOIN sys.types   y on c.[user_type_id] = y.[user_type_id]
WHERE y.[name] IN ('geography','geometry','hierarchyid','image','text','ntext','sql_variant','xml')
 AND  y.[is_user_defined] = 1;

SELECT t.[name], c.[name], c.[system_type_id], c.[user_type_id], y.[is_user_defined], y.[name]

FROM sys.tables t

JOIN sys.columns c on t.[object_id] = c.[object_id]

JOIN sys.types y on c.[user_type_id] = y.[user_type_id]

WHERE y.[name] IN ('geography','geometry','hierarchyid','image','text','ntext','sql_variant','xml')

AND y.[is_user_defined] = 1;

The workarounds for these data types are listed below:

References

https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-data-types

Questions?

Thanks for reading. We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, (Star Wars), or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!

Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List

Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!

Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.

3 Comments

Richard Caldwell on July 22, 2020 at 4:54 pm

I had high hopes for this being a solution for my use case. I have faithfully attempted to create the entities in the above tutorial. There are a number of places where the text appears to be either out of order, or missing key elements.

To summarize:
Create Source Dataset
You create a dataset to read in the tableList, but then at 4.a.-b. you talk about a sink dataset, and then immediately thereafter appears “Create Sink Dataset”. It seems as 4. should have been deleted…

Create Sink Dataset
It appears you have the wrong image under #3.

Create Pipeline to Iterate and Copy Tables
No steps listed after #5, skipping entirely the configuration of Source, Sink, Mapping, or Settings options.

Create Pipeline to Get Table List and Trigger Copy Data pipeline
Step 6.e. does not validate — “Invalid Position 11 Unrecognized expression: ‘LookupTableList’” error message.
Step 8. — fails to validate per above.

I respectfully offer that you should either correct and update this page, or take it down entirely. In it’s current form, I consider it a hazard, as I have spent significant time trying to understand and follow it before realizing it is not actionable.
- The Key2 Consulting Team on September 3, 2020 at 5:20 pm
  
  Hi Richard,
  
  Thank you very much for your comment and feedback. We apologize that this blog post had errors, and are sorry for any inconvenience this may have caused you. We reviewed your detailed feedback and have made several changes and updates to this post.
  
  Sincerely,
  
  The Key2 Consulting Team
- Mark Lobbezoo on June 28, 2021 at 12:25 pm
  
  The code works however the quotation marks have to be redone.

Trackbacks/Pingbacks

How To Copy Multiple Tables from On-Prem to Cloud in Azure Data Factory with PowerShell - […] Phil Woubshet In part 1 of this series, we implemented a solution within the Azure Portal to…

How to Copy Multiple Tables from On-Premise to Cloud in Azure Data Factory

Prerequisites

Create and Populate On-Premise SQL Server Metadata Table

Data Factory Configuration

Create Linked Service to On-Premise SQL Server Source

Create Linked Service to Azure Synapse Analytics (Formerly SQL DW Sink)

Create Source Dataset

Create Sink Dataset

Create Pipeline to Iterate and Copy Tables

Create Pipeline to Get Table List and Trigger Copy Data pipeline

Trigger Pipeline

Azure Synapse Analytics (Formerly SQL DW) Unsupported Data Types

References

Questions?

Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List

3 Comments

Trackbacks/Pingbacks

Recent Posts

Categories