Anvesh Vallabhaneni, Business Intelligence at Key2 Consulting
By: Anvesh Vallabhaneni

 
My goal with this blog post is to get you familiarized with a few machine learning capabilities that we can quickly implement using RevoScaleR and SQL Server 2017. I expect you (as the reader) to have a lot of familiarity with R and SQL.

In my previous blog post, “Data Encryption – How Your Organization Can Prevent a Data Breach”, we discussed the importance of securing data as well as some simple steps one can take to prevent data breaches. Data in transit is susceptible to risk, but using R within SQL Server enables you to have one data repository that helps you manage and audit transactions easily using database security. Another advantage of using R within SQL server is that it mitigates memory and CPU constraints by working seamlessly with the database engine.

To put it simply, using R within SQL Server can empower us to achieve many great things. For this blog post, I will show you how to conduct exploratory analysis, visualize data using several libraries available in R, and build predictive models to forecast data.

Let’s get started.

If you have ever taken a programing class, you know that the class often starts with creating a program that writes “Hello World” to the console. Let’s do the same here.
 

 
This is a simple script that takes the result of your “Select” query, stores it in R, and then outputs it. However, pay attention to the variables OutputDataSet and InputDataSet in the script parameter. InputDataSet is by default the variable to which the input data from @input_data_1 gets stored, and OutputDataSet is by default the Output of the RevoScaleR script. If desired, these can be renamed using “@input_data_1_name = N’‘, @output_data_1_name = N’‘” as additional parameters. Be sure to keep in mind that R is case sensitive!

Now that we know how to connect between R and SQL server, let’s visualize the data.
 

(You can download the above code here.)
 
The above code plots “Miles per Gallon” against “Weight of Car” and plots linear and smooth regression lines. The output is of the varbinary format, and to look at it, we can simply create a report with the above stored procedure used as our dataset and set the image properties to source from the dataset. See below.
 

 
Alternatively, we could also write the output to a .jpeg file and view it from File Explorer. Below is the scatterplot output that is displayed by the report.
 

 

Building a Predictive Model

Now for the finale. Let’s build a predictive model.

To do this, we first need to create a model on our training data and save it in the database. We will reuse it as needed to predict real-time or test data. Once the model is built, it can be copied to – and used on – different servers. From the scatterplot above, we can assume there is some level of linear correlation between “Miles per Gallon” and “Weight of Car.” For simplicity’s sake, let‘s build a linear model using the rxLinMod function from RevoScaleR.
 

(You can download the above code here.)
 

 
Take a look at the results above. The predictions were pretty accurate given that it was a simple linear model, right? There are several methods and features that would further improve our model’s accuracy, but we’ll save those for later.
 

Summary

In this blog post, we discussed:

• the basic steps for using R within SQL Server
• visualizations generated by R in reports
• building predictive models to predict the target variable

To learn more about the applications of R in SQL, we highly recommend checking out these resources from Microsoft’s training website: SQLServerRTutorials and RevoScaleR Functions.
 

Keep your data analytics sharp by subscribing to our mailing list!

Thanks for reading! Keep your knowledge sharp by subscribing to our mailing list to receive fresh Key2 content around data analytics, business intelligence, data warehousing, and more!

 
 
 
 
 


Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.