By: Brad Harris
So you’re excited – a brand new cloud data project has just come your way. The project involves designing a solution that provides a hybrid approach for moving data to the cloud. Your client wants to take a step forward in utilizing cloud resources but also has a major investment in on-premise data and infrastructure that they are not quite ready to give up yet.
Maybe for some of us consultants this would be a new experience. Maybe you have spent most of your data-shuffling-days living inside an on-premise SQL Server instance and using SSIS to move your data to and fro. But maybe you have a little bit of cloud experience with some of the other big providers like Amazon or Google, and you’re looking to expand your horizons a little bit.
In this article, I am going to talk about the Microsoft Azure Integration Runtime within Azure Data Factory. I will explain how you can use the tool to leverage your on-premise data and move it to the cloud, where it can be used for a multitude of different purposes (like cold data storage in a data lake and complicated statistical analysis with Synapse Analytics).
Azure Integration Runtime in Azure Data Factory
The Azure Integration Runtime in Azure Data Factory (ADF) is the behind-the-scenes-brain of ADF. It connects and provides all the compute resources to copy and move data across public and private data stores, whether they be on-premise or within a virtual network.
There are currently three different flavors of the Azure Integration Runtime:
Their use should be considered based on the data integration capabilities and networking needs of your project. While I will try to cover the basics of all three types, the most widely used runtimes are the Azure and the Self-Hosted integration runtimes.
**For a more detailed explanation of Azure Integration Runtime in Azure Data Factory, we recommend visiting Microsoft’s documentation here.
1. Azure Integration Runtime
The Azure Integration Runtime is the default runtime that you will get when spinning up an instance of ADF. This runtime will allow you to easily connect to your Azure resources for data integration.
From an ADF standpoint, one of the first things you do when designing a pipeline is to define the services that you need to connect to. The services are held inside of linked services. Examples of these linked services are Azure DB, Azure Synapse, and Azure Data Lake.
One of the items needed in order to define a linked service is to state the integration runtime that will be used. If your linked service is connecting to an Azure DB inside of your Azure subscription, it is most appropriate to use the default Azure Integration Runtime, as this runtime will allow you to connect to any of your already existing Azure resources.
As an example, if your pipeline involved moving data from ADLS to Azure Synapse, your pipeline might involve several different linked services: Azure Key Vault, Azure Synapse, and Azure Data Lake services. The Azure Integration Runtime will be able to connect and service data integration needs for your pipeline as long as these services are within your Azure subscription.
While the Azure Integration Runtime is primarily meant for cloud resources, it does have the capability to reach back to on-premise resources and is designed for resources with publicly-accessible endpoints.
When enabling a Managed Virtual Network, the integration runtime has the ability to reach a private linked service within a private network environment. This basically means that you can reach on-premise resources using it but you have to jump through some hoops on the networking side to get everything set up.
From a scalability standpoint there is nothing that needs to be done, as this is fully managed within you Azure subscription. No need to worry about software installs or server patching.
2. Azure Self-Hosted Integration Runtime (SHIR)
This runtime is where the rubber meets the road if you are really looking to bring your on-premise data into the cloud and back again. The Azure Self Hosted Integration Runtime (SHIR) side steps all the network configuration that you would have to do and maintain if you were to use the Azure Integration Runtime.
The SHIR is a software install that is implemented on standalone nodes within your private network. When adding an integration runtime, you have the option to choose self-hosted. And after walking through the configuration steps for self-hosted it will output a link to download the runtime on the chosen nodes.
Once installed on the nodes, you can configure the runtime with the service key that allows authentication with ADF. I say nodes because you have the option of installing the runtime on multiple servers to allow active-active scalability. This differs from the Azure Integration Runtime where with the SHIR you do have to worry about software installs and server patching.
After configuring the nodes, the runtime install will then start communicating with your ADF instance and will be ready for communication when adding linked services into ADF. SHIR is what you want to use if you have a data integration project that requires movement of data from a private network environment which doesn’t have a direct line-of-sight from the public cloud environment.
3. Azure SSIS Integration Runtime
Azure SSIS Integration Runtime is not as widely used as the previous two but still highly useful. This runtime allows you to lift and shift your existing SSIS workload and can natively execute your SSIS packages.
The Azure SSIS Integration Runtime can be provisioned in either a public network or a private network. Just like the Azure IR, on-premise access is supported by connecting your Azure SSIS Integration Runtime to a virtual network that is connected to your on-premise network. All you are essentially doing is connecting the Azure SSIS Integration Runtime to a SQL Database or SQL Managed instance that houses your catalog of SSIS projects and packages.
So as you can see when it comes to ADF integration runtimes, picking the right one depends on your specific data integration needs While there are some overlapping points for each of the runtimes, each one has their own unique advantages and disadvantages. If you need help with using ADF Integration Runtimes, we can help! Contact us today.
Thanks for reading! We hope you found this blog post useful. Feel free to let us know if you have any questions about this article by simply leaving a comment below. We will reply as quickly as we can.
Keep Your Data Analytics Knowledge Sharp
Get fresh Key2 content and more delivered right to your inbox!
Key2 Consulting is a boutique data analytics consultancy that helps business leaders make better business decisions. We are a Microsoft Gold-Certified Partner and are located in Atlanta, Georgia. Learn more here.