Phil Woubshet Key2 Consulting
By: Phil Woubshet
 
 
This exercise will demonstrate how to copy multiple tables from an on-premise SQL Server database to Azure SQL Data Warehouse. We will use a metadata table that contains the schema/table names to drive the copy process from source to sink.

There are numerous reasons to utilize cloud computing to supplement or replace a traditional on-premise system. Our current use cases are:

  1. Copying numerous tables between environments (Prod to Test, Prod to Sandbox, Test to Dev, etc.)
  2. Migrating from an on-premise environment to the cloud to take advantage of flexible resources
  3. Sharing data with different groups that do not have access to an on-premise resource

 

Prerequisites

  1. On-premise SQL Server
  2. Optional: Azure blob storage account that will be used for staging data during the copy in order to utilize Polybase
  3. Self-hosted Integration Runtime within Data Factory (instructions to create it if you don’t already have one)
  4. Azure Synapse Analytics (instructions to create it if you don’t already have one)

 

Create and Populate On-Premise SQL Server Metadata Table

Create a metadata table in your on-premise database that will drive the copy operation. For the purposes of this demo, we are using the tables [Person].[Address] and [Person].[AddressType] that came from an [AdventureWorks] database.

 

Data Factory Configuration

Create a Data Factory

 

  1. Launch Microsoft Edge or Google Chrome web browser. Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.
  2. Go to the Azure portal.
  3. From the Azure portal menu, select Create a resource.
  4.  

    Azure Data Factory Screenshot

     

  5. Select Integration, and then select Data Factory.
  6.  

    Azure Data Factory Demo Screenshot 2

     

  7. On the New data factory page, enter a name for your data factory.
  8.  
    The name of the Azure data factory must be globally unique. If you see the following error, change the name of the data factory (for example, ADFTutorialDataFactory) and try creating again. For naming rules for Data Factory artifacts, see the Data Factory – naming rules article.
     
    Data Factory Name
     

  9. For Subscription, select your Azure subscription in which you want to create the data factory.
  10. For Resource Group, use one of the following steps:
  11. a. Select Use existing, and select an existing resource group from the list.

    b. Select Create new, and enter the name of a resource group.

  12. For Version, select V2.
  13. For Location, select the location for the data factory.
  14. Select Create.
  15. After the creation is complete, you see the Data Factory page. Select the Author & Monitor tile to start the Azure Data Factory user interface (UI) application on a separate tab.
  16.  
    Azure Data Factory Page
     

  17. On the Let’s get started page, switch to the Author tab in the left panel.
  18.  
    Azure Data Factory
     

    Create Linked Service to On-Premise SQL Server Source

    1. Select Manage, and then select the New button on the toolbar.
    2. On the New linked service page, select SQL Server, and then select Continue.
    3. Fill out the required fields and be sure to choose your self-hosted integration runtime under Connect via integration runtime. Test Connection and select Create.
    4.  
      Edit Linked Service Azure Data Factory
       

      Create Linked Service to Azure Synapse Analytics (Formerly SQL DW Sink)

      1. Select Manage, and then select the New button on the toolbar.
      2. On the New linked service page, select Azure Synapse Analytics, and then select Continue.
      3. Fill out the required fields, Test Connection and select Create.
      4.  
        Edit linked service - Azure SQL Data Warehouse
         

        Create Source Dataset

        This dataset will connect to the source metadata table that contains the table names to copy.
         

        1. Select the + (plus) button, and then select Dataset.
        2.  
          Factory Resources

        3. On the New dataset page, select SQL Server, and then select Continue.
        4. Under Linked Service, choose the on-prem linked service previously created, select the metadata table under Table name, and then select OK.
        5.  
          set properties
           

          Create Sink Dataset

          This dataset will connect to the sink Azure Synapse Analytics where the tables will be copied to.

          1. Select the + (plus) button, and then select Dataset.
          2.  
            factory resources 2
             

          3. On the New Dataset page, select Azure Synapse Analytics, and then select Continue.
          4. Under Linked Service, choose the Azure Synapse Analytics linked service previously created, select the metadata table under Table name, and then select OK.
          5.  
            set properties 2

          6. Switch to the Parameters tab, click + New, and enter DWTableName for the parameter name.
          7. Switch to the Connection tab.
          8. a. For Table, check the Edit option, click into the table name input box, then click the Add dynamic content link below.

            b. In the Add Dynamic Content page, click the DWTableName under Parameters, which will automatically populate the top expression text box @dataset().DWTableName, then click Finish. The tableName property of the dataset is set to the value that’s passed as an argument for the DWTableName parameter. The ForEach activity iterates through a list of tables, and passes one by one to the Copy activity.

             
            add dynamic content
             
            c. Update the Connection to use the newly created parameter
             
            connection 2
             

            Create Pipeline to Iterate and Copy Tables

            Create the IterateAndCopySQLTables pipeline which will take a list of tables as a parameter. For each table in the list, it copies data from the on-prem table in SQL Server to Azure SQL Data Warehouse using staged copy and PolyBase.

            1. In the left pane, click + (plus), and click Pipeline.
            2.  
              factory resources pipeline
               

            3. In the General tab, specify IterateAndCopySQLTables for name.
            4. Switch to the Parameters tab, and do the following actions:
            5. a. Click + New.

              b. Enter tableList for the parameter name.

              c. Select Array for Type.

            6. In the Activities toolbox, expand Iteration & Conditions, and drag-drop the ForEach activity to the pipeline design surface. You can also search for activities in the Activities toolbox.

            7. a. In the General tab at the bottom, enter IterateSQLTables for Name.

              b. Switch to the Settings tab, click the input box for Items, then click the Add dynamic content link below.

              c. In the Add Dynamic Content page, collapse the System Variables and Functions sections, click the tableList under Parameters, which will automatically populate the top expression text box as @pipeline().parameter.tableList. Then click Finish.

               
              add dynamic content
               
              d. Switch to Activities tab, click Add activity to add a child activity to the ForEach activity.
               

            8. In the Activities toolbox, expand Move & Transfer, and drag-drop Copy Data activity into the pipeline designer surface. Notice the breadcrumb menu at the top. The IterateAndCopySQLTable is the pipeline name and IterateSQLTables is the ForEach activity name. The designer is in the activity scope. To switch back to the pipeline editor from the ForEach editor, you can click the link in the breadcrumb menu.

             

            Create Pipeline to Get Table List and Trigger Copy Data pipeline

            Create the GetTableListAndTriggerCopyData pipeline which will perform two actions:

            • Look up the on-prem SQL Server Database metadata table to get the list of tables to be copied.
            • Trigger the pipeline IterateAndCopySQLTables to do the actual data copy.

             

            1. In the left pane, click + (plus), and click Pipeline.
            2. In the General tab, change the name of the pipeline to GetTableListAndTriggerCopyData.
            3. In the Activities toolbox, expand General, and drag-drop Lookup activity to the pipeline designer surface, and do the following steps:
            4. a. Enter LookupTableList for Name.

              b. Enter Retrieve the table list from Azure SQL database for Description.

            5. Switch to the Settings tab, and do the following steps:
            6. a. Select AzureSqlDatabaseDataset for Source Dataset.

              b. Select Query for Use Query.

              c. Enter the following SQL query for Query.

              d. Clear the checkbox for the First row only field.
               

               

            7. Drag-drop Execute Pipeline activity from the Activities toolbox to the pipeline designer surface, and set the name to TriggerCopy.
            8. Switch to the Settings tab, and do the following steps:
            9. a. Select IterateAndCopySQLTables for Invoked pipeline.

              b. Expand the Advanced section.

              c. Click + New in the Parameters section.

              d. Enter tableList for parameter name.

              e. Click VALUE input box -> select the Add dynamic content below ->

              enter@activity(‘LookupTableList’).output.value as table name value -> select Finish. You’re setting the result list from the Lookup activity as an input to the second pipeline. The result list contains the list of tables whose data needs to be copied to the destination.

               
              Iterate and Copy SQL Tables
               

            10. To validate the pipeline, click Validate on the toolbar. Confirm that there are no validation errors. To close the Pipeline Validation Report, click >>.
            11. To publish entities (datasets, pipelines, etc.) to the Data Factory service, click Publish All on top of the window. Wait until the publishing succeeds.
            12.  

              Trigger Pipeline

              Now that the pipelines are configured, it’s time to run them.

              1. Select GetTableListAndTriggerCopyData from the available Pipelines.
              2. Select Add Trigger, then select Trigger Now and select Finish.
              3.  

                 

              4. To monitor progress, select the red Monitor icon on the leftmost pane.
              5.  

                 

                Azure Synapse Analytics (Formerly SQL DW) Unsupported Data Types

                The following data types are unsupported in Azure Synapse Analytics:

                These can be identified in your source database via the following query:
                 

                 
                The workarounds for these data types are listed below:
                 

                 

                 

                References

                https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal
                https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal
                https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-data-types
                 

                Questions?

                Thanks for reading. We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, (Star Wars), or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!
                 

                Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List

                Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!

                 


                Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.