By: Phil Woubshet
In part 1 of this series, we implemented a solution within the Azure Portal to copy multiple tables from an on-premise SQL Server database to Azure Synapse Analytics (formerly Azure SQL Data Warehouse).
Today’s exercise will be to implement the same solution programmatically using PowerShell. We will skip the Azure Portal interface entirely.
A PowerShell solution could help augment an existing solution that you build in the Azure Portal that has repeatable elements to it. You could also utilize a PowerShell solution to launch similar configurations to different resources, servers, data factories, etc.
Here’s an outline of what we will accomplish:
- Connect Azure Account
- Create Resource Group
- Create Data Factory
- Create Linked Service
- Create Pipeline
- Execute Pipeline
PowerShell Module installation
This is a one-time task per machine where you want to run the PowerShell script:
1 2 |
Install-Module AzureAutomationAuthoringToolkit -Scope CurrentUser Install-Module -Name Az -AllowClobber -Scope CurrentUser |
Create Configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# Switches that dictate whether to create an item or use an existing one. $createResource = $false $createDataFactory = $false $createADLS = $false $createStorageLinkedService = $false $createSQLStorageLinkedService = $false $createDatasets = $false $createPipeline = $true # Azure and SQL Server On-Prem variables (populate with your settings) $SubscriptionId = "" $resourceGroupName = "" $region = "East US" $dataFactoryName = "" # Must be globally unique $storageAccountName = "" $storageAccountKey = "" $sqlServerName = "" $sqlDatabaseName = "" $sqlUserName = "" $sqlPassword = "" $integrationRuntimeName = "" |
Main Script Components
A. Connect Azure Account
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
function Login($SubscriptionId) { $context = Get-AzContext if (!$context -or ($context.Subscription.Id -ne $SubscriptionId)) { Connect-AzAccount -Subscription $SubscriptionId } else { Write-Host "SubscriptionId '$SubscriptionId' already connected" } } # Connect your Azure account Login($SubscriptionId) |
B. Create Resource Group
1 2 3 4 |
if ($createResource) { New-AzResourceGroup -Name $resourceGroupName -Location $region #Remove-AzResourceGroup -Name $resourceGroupName } |
C. Create Data Factory
1 2 3 4 |
if ($createDataFactory) { Set-AzDataFactoryV2 -ResourceGroupName $resourceGroupName -Name $dataFactoryName -Location $region #Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName } |
D. Create Linked Service to On-Premise SQL Server Database
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
if ($createSQLStorageLinkedService) { # JSON definition of the linked service. $sqlServerLinkedServiceDefinition = @" { "properties": { "type": "SqlServer", "typeProperties": { "connectionString": { "type": "SecureString", "value": "Server=$sqlServerName;Database=$sqlDatabaseName;User ID=$sqlUserName;Password=$sqlPassword;Timeout=60" } }, "connectVia": { "type": "integrationRuntimeReference", "referenceName": "$integrationRuntimeName" } }, "name": "SqlServerLinkedService" } "@ # Stores the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command. $sqlServerLinkedServiceDefinition | Out-File D:\AzureDF\SqlServerLinkedService.json # Create a SQL Server linked service Set-AzDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "DemoSqlServerLinkedService" -File "D:\AzureDF\SqlServerLinkedService.json" } |
E. Create Datasets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
if ($createDatasets) { # Create dataset for OnPrem metadata table $sqlDataFactoryMetaDefinition = @" { "name": "OnPremSqlTestDB", "properties": { "linkedServiceName": { "referenceName": "DemoSqlServerLinkedService", "type": "LinkedServiceReference" }, "annotations": [], "type": "SqlServerTable", "schema": [ { "name": "TableSchema", "type": "varchar" }, { "name": "TableName", "type": "varchar" }, { "name": "CreateScript", "type": "varchar" }, { "name": "SourceDB", "type": "varchar" }, { "name": "TargetDB", "type": "varchar" }, { "name": "Active", "type": "bit" } ], "typeProperties": { "schema": "dbo", "table": "DataFactoryMeta" } }, "type": "Microsoft.DataFactory/factories/datasets" } "@ $sqlDataFactoryMetaDefinition | Out-File D:\AzureDF\sqlDataFactoryMetaDefinition.json # Create dataset in the data factory Set-AzDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "SqlMetaDataset" -File "D:\AzureDF\sqlDataFactoryMetaDefinition.json" # Create dataset for Azure Synapse Analytics sink (formerly known as Azure SQL DW) $sinkDatasetDefiniton = @" { "name": "AzureSqlDWTableDummy", "properties": { "linkedServiceName": { "referenceName": "AzureSqlDWkey2DWPW", "type": "LinkedServiceReference" }, "parameters": { "DWTableName": { "type": "string" }, "DWSchemaName": { "type": "string" }, "DWCreateTable": { "type": "string" } }, "annotations": [], "type": "AzureSqlDWTable", "schema": [], "typeProperties": { "schema": { "value": "@dataset().DWSchemaName", "type": "Expression" }, "table": { "value": "@dataset().DWTableName", "type": "Expression" } } }, "type": "Microsoft.DataFactory/factories/datasets" } "@ $sinkDatasetDefiniton | Out-File D:\AzureDF\sinkDatasetDefiniton.json # Create dataset in the data factory Set-AzDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "SinkSynapseDataset" -File "D:\AzureDF\sinkDatasetDefiniton.json" } |
F. Create Pipelines
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
if ($createPipeline) { # Create pipelines # JSON definition of the pipeline $pipelineDefinition = @" { "name": "DemoIterateAndCopySQLTables", "properties": { "description": "Staging", "activities": [ { "name": "IterateSQLTables", "type": "ForEach", "dependsOn": [], "userProperties": [], "typeProperties": { "items": { "value": "@pipeline().parameters.tableList", "type": "Expression" }, "activities": [ { "name": "CopySQLTables", "type": "Copy", "dependsOn": [], "policy": { "timeout": "7.00:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "SqlServerSource", "sqlReaderQuery": { "value": "SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]", "type": "Expression" } }, "sink": { "type": "SqlDWSink", "preCopyScript": { "value": "IF EXISTS(SELECT 1 FROM sys.tables t INNER JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE t.name = '@{item().TABLE_NAME}' AND s.name = '@{item().TABLE_SCHEMA}')\nTRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]\nELSE BEGIN @{item().CreateScript} END", "type": "Expression" }, "allowPolyBase": true, "polyBaseSettings": { "rejectValue": 0, "rejectType": "value", "useTypeDefault": false }, "disableMetricsCollection": false }, "enableStaging": true, "stagingSettings": { "linkedServiceName": { "referenceName": "AzureBlobStorage1", "type": "LinkedServiceReference" }, "path": "adftest", "enableCompression": true } }, "inputs": [ { "referenceName": "SqlMetaDataset", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "SinkSynapseDataset", "type": "DatasetReference", "parameters": { "DWTableName": { "value": "@{item().TABLE_NAME}", "type": "Expression" }, "DWSchemaName": { "value": "@{item().TABLE_SCHEMA}", "type": "Expression" }, "DWCreateTable": { "value": "@{item().CreateScript}", "type": "Expression" } } } ] } ] } } ], "parameters": { "tableList": { "type": "array" } }, "annotations": [] }, "type": "Microsoft.DataFactory/factories/pipelines" } "@ $pipelineDefinition | Out-File D:\AzureDF\SqlServerToBlobPipeline.json # Create a pipeline in the data factory Set-AzDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "DemoIterateAndCopySQLTables" -File "D:\AzureDF\SqlServerToBlobPipeline.json" # Define outer pipeline $pipelineDefinitionOuter = @" { "name": "DemoTriggerCopyData", "properties": { "activities": [ { "name": "LookupTableList", "description": "Retrieve the table list from on-premise SQL Server database", "type": "Lookup", "dependsOn": [], "policy": { "timeout": "7.00:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "SqlServerSource", "sqlReaderQuery": "SELECT TableSchema as TABLE_SCHEMA, TableName AS TABLE_NAME, CreateScript FROM DataFactoryMeta WHERE Active = 1" }, "dataset": { "referenceName": "OnPremSqlTestDB", "type": "DatasetReference" }, "firstRowOnly": false } }, { "name": "TriggerCopy", "type": "ExecutePipeline", "dependsOn": [ { "activity": "LookupTableList", "dependencyConditions": [ "Succeeded" ] } ], "userProperties": [], "typeProperties": { "pipeline": { "referenceName": "DemoIterateAndCopySQLTables", "type": "PipelineReference" }, "waitOnCompletion": true, "parameters": { "tableList": { "value": "@activity('LookupTableList').output.value", "type": "Expression" } } } } ], "annotations": [] }, "type": "Microsoft.DataFactory/factories/pipelines" } "@ $pipelineDefinitionOuter | Out-File D:\AzureDF\TriggerCopyDataPipeline.json ## Create a pipeline in the data factory Set-AzDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "DemoTriggerCopyData" -File "D:\AzureDF\TriggerCopyDataPipeline.json" } |
G. Execute the Pipeline
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
$pipelineName = "DemoTriggerCopyData" # Start the pipeline run $runId = Invoke-AzDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName $pipelineName # Check the pipeline run status until it finishes the copy operation while ($True) { $result = Get-AzDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30) if (($result | Where-Object { ($_.Status -eq "InProgress") -or ($_.Status -eq "Queued") } | Measure-Object).count -ne 0) { Write-Host "Pipeline run status: " + $($_.Status) -foregroundcolor "Yellow" Start-Sleep -Seconds 10 } else { Write-Host "Pipeline $pipelineName run finished. Result:" -foregroundcolor "Yellow" $result break } } |
References
https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-powershell
https://docs.microsoft.com/en-us/azure/data-factory/monitor-programmatically
Questions?
Thanks for reading. We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, Star Wars, or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!
Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List
Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!