By: Jason Bacani

Business Intelligence Consultant @ Key2 Consulting

 
 

Trying to master Microsoft Azure can be a daunting task. I’ve spent the majority of my IT career in the SQL Server world and the database offerings alone under Azure can still be overwhelming. From Security and AI, to Network and Storage, to the Internet of Things, Azure’s offerings sometimes seem endless!

 

Microsoft Azure

[https://azure.microsoft.com/en-us/services/?v=17.04b]

 

Today, I find myself challenged to learn more about the area under Azure called Azure Data Lake (also referred to as ADL):

 

Microsoft Azure Data Lake

[https://azure.microsoft.com/en-us/services/?v=17.04b]

 

Azure Data Lake is comprised of both Data Lake Analytics and Data Lake Store. It’s found under the Data and Analytics list of product offerings under Microsoft Azure. But before we get into Azure Data Lake, let’s take a quick step back.

 
Lake
 

What is a data lake?

 

Think of a data lake as a large data storage repository, where the data is held in its native format. Unlike a hierarchical data warehouse which stores its data in files and folders, a Data Lake uses a flat architecture to store data. Data elements in the lake are each assigned a unique identifier and tagged with a set of extended metadata tags. To answer business questions, the Data Lake is queried only for the relevant data, and then that smaller set of data can be further analyzed to help answer those questions. [http://searchaws.techtarget.com/definition/data-lake]

 

Data Lake is often associated with a Hadoop-oriented object storage, also known as Big Data. Large amounts of data (no matter their format) can be queried quickly. For businesses with large volumes of unstructured data, this results in fast answers to business questions.

Now let’s circle back…

 

What is Microsoft Azure Data Lake?

 

In short, Microsoft Azure Data Lake is a highly scalable data storage and analytics service. The service is hosted in Azure under the Microsoft public cloud ecosystem. ADL is largely intended for big data storage and analysis. Like other data lakes -developers, scientists, business professionals, and other users are able to quickly gain insight from large, complex data sets. And because Azure Data Lake is a cloud computing service, customers are given a faster and more efficient alternative to deploying and managing big data infrastructure within their own on premise data center. [http://searchaws.techtarget.com/definition/data-lake]

 

Microsoft refines Azure Data Lake into two offerings, Data Lake Analytics and Data Lake Store:

 

Azure Data Lake Store

[https://azure.microsoft.com/en-us/solutions/data-lake/]

 

When you look at what Analytics does versus what Store does, one can quickly see their purposes and differences.

 

For your analytics needs, Azure Data Lake Analytics centers around U-SQL, which is a hybrid of traditional SQL coupled with the expressive power of user code, specifically C#. Millions of developers, whether SQL and/or .NET, can quickly navigate, adapt, and adopt U-SQL code. Below is a quick sample of what U-SQL looks like. U-SQL defines a small dataset and then writes that dataset to a file within the Data Lake Store:

 

U-SQL, Azure Data Lake

[https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-get-started-portal]

 

For your storage needs (specifically large data storage needs), Azure Data Lake Store is a webHDFS (Hadoop Distributed File System) compatible store. It is massively scalable for extremely high throughput and low latency.

 

When it comes to scaling data storage needs, ADL Store provides for futureproofing the data needs. Because large amounts of data can be stored in either a structured, semi-structured, or unstructured manner, flexibility is offered by ADL Store to fit the evolving business and data needs.

 

Azure Data Lake is based on the Apache Hadoop YARN (Yet Another Resource Negotiator) cluster management platform. It’s intended to scale dynamically within Microsoft’s Azure public cloud. This helps the service accommodate the needs of big data projects, which tend to be compute-intensive, ever-growing in data size, and ever-evolving in flexible data structure requirements.

 

Here’s a screenshot of Microsoft’s website that highlights the benefits of Microsoft Azure Data Lake.

 

Screenshot - Highlights of Microsoft Azure Data Lake

[https://azure.microsoft.com/en-us/solutions/data-lake/]

 

I’m hopeful this article whetted your appetite to learn more about Microsoft Azure Data Lake and it provided a great introduction to what the tool does. Additional links are provided below for more learning. And as always, be sure to subscribe to our blog as we share more insight regarding other aspects of Microsoft Azure!

 

Reference:

 

Introducing Microsoft Azure: https://docs.microsoft.com/en-us/azure/fundamentals-introduction-to-azure

What is Azure Infographic: https://azure.microsoft.com/en-us/resources/infographics/azure/

Big Data Made Easy With Azure Data Lake, Video: https://azure.microsoft.com/en-us/resources/videos/data-lake-store-primer/