By: Ken Adams
Senior BI Consultant at Key2 Consulting
As was mentioned in a previous blog post comparing the pros and cons of Microsoft Azure SQL Database versus SQL Data Warehouse, Microsoft announced at SQL PASS 2016 that they are offering a free, 1-month trial of SQL Data Warehouse. In this post, I wanted to take Microsoft up on their offer and show you how to get started using Azure SQL Data Warehouse (SQL DW)!
So let’s get started. First, head over to the free trial link from above –
You will notice you’re applying for a free trial and that there are a couple of things required. The important details are:
An Azure account is necessary – you can sign up for free if you do not already have one.
- Please note this does require providing a valid email address, phone number, and credit card or debit card number. As stated on the Azure sign-up site, you will not be charged for anything unless you decide to extend the trial. The card number is for identification purposes only. I guess we have to trust them here. 😉
A referral code was provided during Joseph Sirosh’s keynote at SQL PASS. Joseph Sirosh is over the Data Platform group, which includes Azure. If you weren’t at PASS, the referral code can be found with a simple Google search. I will make it easy on you –the referral code is SQLPASSROCKS.
Upon submitting a request, you must wait for an email with the next steps. By the way, the free trial includes compute resources of 200 Data Warehouse Units (DWU) and 2TB of storage. This seems to be a good starting point for a test drive of SQL DW. The 200 DWU represents 2 compute nodes and 30 distributions per node. You can find a good breakdown of the DWU and what that equates to in terms of computing power here. The crux of DWU to compute nodes is seen in the chart below:
DWU | #of compute nodes |
# of distributions per node |
100 | 1 | 60 |
200 | 2 | 30 |
300 | 3 | 20 |
400 | 4 | 15 |
500 | 5 | 12 |
600 | 6 | 10 |
1000 | 10 | 6 |
1200 | 12 | 5 |
1500 | 15 | 4 |
2000 | 20 | 3 |
3000 | 30 | 2 |
6000 | 60 | 1 |
Be careful that the reply from Microsoft does not find its way into your spam inbox, which happened to me. The email is seen below:
Once I submitted my request for a free trial, it took about 24 hours to receive a response. The response I received seemed a little more personal than what I was anticipating, meaning it appeared to come from an actual person. I was expecting an automated reply.
Here is the body of that message:
Thank you for submitting your request for Azure SQL Data Warehouse free trial.
As next steps to activate the free trial, please create a logical SQL server by logging into the Azure Portal. Please note that the logical SQL Server must be created in one of the following regions to participate in the Azure SQL Data Warehouse free trial program.
Available regions for Azure SQL Data Warehouse free trial: West US, West US2, West Central US, North Europe, West Europe, East Asia, South East Asia.
Please reply to this email with the following details within the next 2 business days.
1. Logical SQL Server name
2. The region in which you created the logical SQL Server. This should be one of the above-mentioned regions for you to be eligible for the free trial
3. The Azure subscription ID under which you created the logical SQL Server
We will get back to you with the next steps after reviewing the above-mentioned details that you provide. Please note that your one-month free trial does not start until we send you a confirmation email that provide you the offer validity period. We advise you not to create the SQL data warehouse instance in this logical SQL Server until we send you the confirmation email about the offer validity period.
You will notice that the response is time sensitive, so plan accordingly. Once I received the reply, I went ahead and logged into the Azure portal.
Clicking on the Azure Portal will take you to a login screen where you will login with your Azure credentials. Remember this was a prerequisite to signing up for the free trial. Once you log in, you will see your Azure dashboard. Your dashboard will likely be a blank dashboard unless you possess other services.
The email received states that you need to set up a new instance in one of the required regions and provide that instance information back to the Azure team. I did that by clicking on the + New link at the top left of the Azure page and typing “logical” in the search box:
That brought up SQL server (logical server), which I selected.
You are then presented with a screen with the filtered selections. The logical server should be the first option in the list.
Click on SQL server (logical server) and then create:
Note that in the description, the logical server functions as a host for the various Azure database offerings. After clicking ‘Create’ you are ready to set up the server. You are required to fill in the following:
I had previously established a resource group, so I used that. You may need to create a new resource group. I chose to have my instance pinned to my dashboard to make it easier to access. Upon selecting Create you are taken back to the dashboard, where you can see the instance being generated:
In all of 30 seconds, the instance was created and the details were displayed. The SQL host information is what you need to provide to the Azure team. I have highlighted the request information below:
Send an email with the highlighted pieces of information and wait for the green light from the Azure team. It took about 5 days to get approval from the Azure team in the DW free trial program.
From the time I sent the email back to the Azure team it took about 5 days to get the approval in the Azure DW free trial program. I want to point out a couple of things here:
– First off, the Azure team representative was helpful and professional. I was persistent in trying to get a response and resolution to the problem mentioned in the next bullet.
– Please note that the logical SQL Server name is case-sensitive. When providing the details requested please make sure the name you give them matches the name in Azure down to the case. This caused some issues for me as my email client “corrected” the case of my instance name – it changed “sqldwtrialhost” to “Sqldwtrialhost”. This caused a bit of pain for the Azure team to locate my logical SQL Server and probably delayed my approval by a day or so.
The acceptance email into the trial program arrives with a “Getting Started” guide and details about what you get in the trial. A few highlights:
– 1 month of SQL DW service
– SQL DW service at 200 DWU
– Charges will start after 1 month unless you delete your instance
The guide provided by the Azure team is helpful in getting started. It details setting up SQL DW and many of the touch points along the way. I read that and then started down the road. First, log into the Azure portal and select the + New link\Database\SQL Data Warehouse.
The next screen will be the details of your SQL DW instance. See my screen below:
A couple of important items to note here:
– Select Free Trial for the Subscription since we’re signed up as part of the free trial group.
– The resource group should be the same group created while setting up your SQL Server.
– I left the source as “Blank database.”
– Server should be the server name set up and ultimately approved by the Azure team.
– Performance, as stated in the acceptance email, should be set at 200 DWU. You will have to modify the value or slider down as the default is 400 DWU. My assumption is that if you set it too high you could get charged.
– I also selected Pin to dashboard to make it more accessible from the Azure portal.
Now click create and wait while the SQL DW instance is formed:
The process to create the SQL DW instance took 2 minutes and I was ready to move forward. The dashboard pin is updated and shows that the SQL DW instance is Online.
After the SQL DW was deployed, it sent me directly to a configuration page. From here, we need to work on the firewall settings. Start by clicking on the server name as seen below:
Clicking the server name link brings up another configuration page. From here, you can access the firewall settings from a couple of places – click on Firewall from the menu on the left or by clicking on the Show firewall setting link:
This brings up yet another configuration screen:
Now you need to determine what this rule represents. This should be the IP address of the machine where you plan to access Azure SQL DW. I have a Windows 10/SQL Server 2016 virtual machine that I used. I added a rule name, a start and end IP address, and saved my new rule. If you log in from different networks, you will also want to add those exceptions, as the IP address will be different.
At this point, we are ready to try to connect to SQL DW. I’m using SQL Server Management Studio (SSMS) 2016 from my Windows 10 VM. Initially you can log in with the administrator login, which was set up when the SQL Server was created. However, you will want to set up a new user based on the guide sent along by the Azure team. My initial login was successful.
As seen in the SSMS Object Explorer:
According to the guide, the administrator class that comes with the server is set to a smaller resource class with SQL DW. The new user will be set-up and given access to the larger resource class. The following code can be run to perform this operation:
1 2 3 4 5 6 7 8 9 10 11 12 |
--create a login for the server CREATE LOGIN [key2_user] WITH PASSWORD = '5TGBnhy6'; --now create a user for that login on your database CREATE USER ukey2 FOR LOGIN [key2_user]; GRANT CONTROL ON DATABASE::[sqldwtrialdb] to ukey2; --from the set-up guide this adds the new user to the larger resource class EXEC sp_addrolemember 'xlargerc', 'ukey2'; |
Now that we have gotten past the administrative portion of setting up a user, I would like to start importing some data. Data can be imported into SQL DW from Azure Blob Storage, SQL Server, and HDInsight among others. You can find a summary of these options here. For the initial population of the SQL DW database, I chose the import from an Azure Blob. The Azure Blob Storage is analogous to Amazon’s S3 storage and is defined by Microsoft as a service for storing large amounts of unstructured data that can be accessed from anywhere in the world via HTTP or HTTPS. Fortunately there is publicly available data in Azure Blob storage that can be accessed. That made it easy to begin setting up a connection and importing data.
The first step is to establish a connection to the source data. I chose the Contoso Retail DW, which is a sample retail dataset. The code to set up the connection is below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
--create a schema for the Azure Store Blob source(asb) CREATE SCHEMA asb; --define the external data source CREATE EXTERNAL DATA SOURCE extAzurePublicBlob_Contoso WITH ( TYPE = Hadoop ); --define the format for the Contoso data CREATE EXTERNAL FILE FORMAT [PipeDelimitedFileFormat] WITH ( FORMAT_TYPE = DELIMITEDTEXT , FORMAT_OPTIONS ( FIELD_TERMINATOR = '|' , STRING_DELIMITER = '' , DATE_FORMAT = 'yyyy-MM-dd HH:mm:ss.fff' , USE_TYPE_DEFAULT = FALSE ) ); --asb.DimStore CREATE EXTERNAL TABLE [asb].DimStore ( [StoreKey] [int] NOT NULL, [GeographyKey] [int] NOT NULL, [StoreManager] [int] NULL, [StoreType] [nvarchar](15) NULL, [StoreName] [nvarchar](100) NOT NULL, [StoreDescription] [nvarchar](300) NOT NULL, [Status] [nvarchar](20) NOT NULL, [OpenDate] [datetime] NOT NULL, [CloseDate] [datetime] NULL, [EntityKey] [int] NULL, [ZipCode] [nvarchar](20) NULL, [ZipCodeExtension] [nvarchar](10) NULL, [StorePhone] [nvarchar](15) NULL, [StoreFax] [nvarchar](14) NULL, [AddressLine1] [nvarchar](100) NULL, [AddressLine2] [nvarchar](100) NULL, [CloseReason] [nvarchar](20) NULL, [EmployeeCount] [int] NULL, [SellingAreaSize] [float] NULL, [LastRemodelDate] [datetime] NULL, [GeoLocation] NVARCHAR(50) NULL, [Geometry] NVARCHAR(50) NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) WITH ( LOCATION='/DimStore/' , DATA_SOURCE = extAzurePublicBlob_Contoso , FILE_FORMAT = PipeDelimitedFileFormat , REJECT_TYPE = VALUE , REJECT_VALUE = 0 ) ; --now query that table SELECT * FROM [asb].[DimStore] |
Referencing external objects in this manner was new for SQL Server 2016. It allows you to use T-SQL statements to connect to and query data stored in Azure Blob Storage or Hadoop. You can find more information on this concept here. Being able to query unstructured objects is very reminiscent of HiveQL querying a Hive table except that you get to use T-SQL!
In addition to the store dimension, I set up references to the Contoso production dimension on inventory fact table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
--asb.DimProduct CREATE EXTERNAL TABLE [asb].DimProduct ( [ProductKey] [int] NOT NULL, [ProductLabel] [nvarchar](255) NULL, [ProductName] [nvarchar](500) NULL, [ProductDescription] [nvarchar](400) NULL, [ProductSubcategoryKey] [int] NULL, [Manufacturer] [nvarchar](50) NULL, [BrandName] [nvarchar](50) NULL, [ClassID] [nvarchar](10) NULL, [ClassName] [nvarchar](20) NULL, [StyleID] [nvarchar](10) NULL, [StyleName] [nvarchar](20) NULL, [ColorID] [nvarchar](10) NULL, [ColorName] [nvarchar](20) NOT NULL, [Size] [nvarchar](50) NULL, [SizeRange] [nvarchar](50) NULL, [SizeUnitMeasureID] [nvarchar](20) NULL, [Weight] [float] NULL, [WeightUnitMeasureID] [nvarchar](20) NULL, [UnitOfMeasureID] [nvarchar](10) NULL, [UnitOfMeasureName] [nvarchar](40) NULL, [StockTypeID] [nvarchar](10) NULL, [StockTypeName] [nvarchar](40) NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [AvailableForSaleDate] [datetime] NULL, [StopSaleDate] [datetime] NULL, [Status] [nvarchar](7) NULL, [ImageURL] [nvarchar](150) NULL, [ProductURL] [nvarchar](150) NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) WITH ( LOCATION='/DimProduct/' , DATA_SOURCE = extAzurePublicBlob_Contoso , FILE_FORMAT = PipeDelimitedFileFormat , REJECT_TYPE = VALUE , REJECT_VALUE = 0 ) ; --now query that table SELECT * FROM [asb].[DimProduct]   --asb.FactInventory CREATE EXTERNAL TABLE [asb].FactInventory ( [InventoryKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [OnHandQuantity] [int] NOT NULL, [OnOrderQuantity] [int] NOT NULL, [SafetyStockQuantity] [int] NULL, [UnitCost] [money] NOT NULL, [DaysInStock] [int] NULL, [MinDayInStock] [int] NULL, [MaxDayInStock] [int] NULL, [Aging] [int] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) WITH ( LOCATION='/FactInventory/' , DATA_SOURCE = extAzurePublicBlob_Contoso , FILE_FORMAT = PipeDelimitedFileFormat , REJECT_TYPE = VALUE , REJECT_VALUE = 0 ) ; --now query that table SELECT * FROM [asb].FactInventory |
The external objects can now be moved into tables inside of the SQL DW using the CREATE TABLE AS (CTAS) construct. CTAS is available only in SQL DW and Parallel Data Warehouse (PDW). The largest of the three tables is Fact.Inventory which loaded in under 30 seconds. The code snippet is below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
--now let's move the data from the external source into SQL DW CREATE TABLE [Dim].[Store] WITH (DISTRIBUTION = ROUND_ROBIN) AS SELECT * FROM [asb].[DimStore] OPTION (LABEL = 'CTAS : Load [Dim].[Store]'); GO   CREATE TABLE [Dim].[Product] WITH (DISTRIBUTION = ROUND_ROBIN) AS SELECT * FROM [asb].[DimProduct] OPTION (LABEL = 'CTAS : Load [Dim].[Product]'); GO   CREATE TABLE [Fact].[Inventory] WITH (DISTRIBUTION = HASH([ProductKey])) AS SELECT * FROM [asb].[FactInventory] OPTION (LABEL = 'CTAS : Load [cso].[FactInventory]'); GO |
Now you have three tables loaded into the SQL DW. You can see them in the SSMS Object Explorer:
You can also see the three external objects defined. After or during the load into SQL DW you can see the statistics and progress via a system view query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SELECT r.command ,s.request_id ,r.STATUS ,count(DISTINCT input_name) AS nbr_files ,sum(s.bytes_processed) / 1024 / 1024 AS gb_processed FROM sys.dm_pdw_exec_requests r INNER JOIN sys.dm_pdw_dms_external_work s ON r.request_id = s.request_id WHERE r.[label] IN('CTAS : Load [Dim].[Store]', 'CTAS : Load [Dim].[Product]', 'CTAS : Load [cso].[FactInventory]') GROUP BY r.command ,s.request_id ,r.STATUS ORDER BY nbr_files DESC ,gb_processed DESC; |
You get some cool information back such as the amount of data processed, the number of files ingested, and the query executed.
I dropped the tables and re-ran the CTAS script several times hence the duplicate entries. After loading the data into SQL DW, it is suggested that you rebuild the indexes and create statistics. Tables by default are stored as a clustered columnstore index (CCI). In order to ensure the data is fully compressed you need to rebuild the CCI after the data load completes.
1 2 3 4 5 |
ALTER INDEX ALL ON [Dim].[Product] REBUILD; ALTER INDEX ALL ON [Dim].[Store] REBUILD; ALTER INDEX ALL ON [Fact].[Inventory] REBUILD; |
In addition, statistics are not created automatically in SQL DW. You will need to do this once the data is loaded. Fortunately, there is a stored procedure available to do this for all columns in the database. Create the stored procedure in your database and execute it:
1 |
EXEC [dbo].[prc_sqldw_create_stats] 1, 20 |
Your database should now be ready to query and optimized to do so. In a subsequent post, I will dig into some queries, play around with the resources by scaling down the DWUs and even look at an alternative for importing data into SQL DW. Stay tuned…
To learn more about the various methodologies for getting started with SQL Data Warehouse, we highly recommend reading James Serra’s (Data Warehouse Evangelist at Microsoft) recent post on SQL Data Warehouse Reference Architectures.