GitHub Source Control Integration with Azure Synapse Workspace

December 29, 2023
GitHub source control integration with Azure Synapse workspace allows data professionals to manage scripts, notebooks, and pipelines in a version-controlled environment.

By: Syed Islam

 
GitHub source control integration with Azure Synapse workspace allows data professionals to manage scripts, notebooks, and pipelines in a version-controlled environment. This integration provides a centralized repository where teams can collaborate, track changes, and maintain a history of modifications made to their data-related assets.
 

 

Branching in GitHub: Organizing Your Workflow

Branching is a fundamental concept in version control systems like GitHub. It allows developers to work on isolated tasks without affecting the main codebase. In the context of Azure Synapse workspace, branches can represent different features, experiments, or bug fixes within your data projects.
 

Pull Requests (PR): Facilitating Collaboration

Pull requests (PRs) are GitHub’s way of initiating a discussion about changes. In the context of Azure Synapse workspace, PRs are essential for collaboration, allowing team members to review code, provide feedback, and ensure the quality of the changes before they are merged into the main codebase.
 

Merging: Integrating Changes Safely

Merging is the process of integrating changes from one branch into another. In the context of Azure Synapse workspace, merging ensures that the changes made in feature branches are incorporated into the main codebase without disrupting the existing workflows.
 

Prerequisites

Before diving into the integration process, ensure you have the following prerequisites in place:

  1. An Azure Synapse Analytics account
  2. A GitHub Account
  3. A dedicated GitHub repository for source control

 

Tutorial – How to Configure and Integrate GitHub Repository With Azure Synapse Workspace

Now let’s look at how to integrate and maintain source control in GitHub for Azure Synapse Studio.
 
Task Summary

  1. Generate a personal access token from the GitHub portal.
  2. Configure and set up a GitHub repository from Azure Synapse Workspace.
  3. Create a Feature Branch (Dev branch) from the Main Branch in GitHub.
  4. Perform Commit with new or modified Pipelines, Notebooks, SQL Scripts.
  5. Perform Pull Request in GitHub.
  6. Perform Merge with the Main Branch.
  7. Perform Publish from the Main Branch.

 

Step 1: Generate a Personal Access Token From the GitHub Portal

1.1. Log in to the GitHub repository.

1.2. From the profile menu, go to Settings.
 

 
1.3. Select Developer Settings from the lower left panel.
 

 
1.4. Select Personal Access Token from the left panel.
 

 
1.4. Click on the Generate new token button.

1.5. Enter a name in the Note text box for reference.

1.6. In the Expiration box, select 90 days.

1.7. In the Select Scopes box, check repo, admin:org.
 

 
1.8 Click on the Generate Token button.
 

 
Make sure to copy your personal access token now as you won’t be able to see it again!
 

 

Step 2: Configure & Set up GitHub Repository From Azure Synapse Workspace

2.1. Log in to the Azure portal.

2.2. Go to your Synapse workspace.

2.3. Select Pipeline.

2.4. From the top left pane, select Set up code repository.
 

 
2.5. In the Configure a repository section, select GitHub from the Repository type selection.

2.6. Enter the GitHub repository owner name and click Continue. The repository owner name can be found in the GitHub portal.
 

 
2.9. Enter your Personal Access Token value in the dialog box.

2.10. Click Continue.
 

 
2.11. Select Repository name from the list.

2.12. Select ‘main’ as Collaboration branch. Keep the default value ‘workspace_publish’ in Publish branch.

2.13. Click on Apply.
 

 

Step 3: Create a Feature Branch (Dev branch) in GitHub.com From Synapse Workspace

3.1. Log in to your Azure Synapse workspace.

3.2. From the top left panel, select New branch.
 

 
3.3. In the Create a new branch dialog window, enter a branch name (ex. Feature_working_branch).

3.4. Click Create.
 

 
3.5. Confirm and validate the new branch.
 

 

Step 5: Perform Commit With New or Modified Pipelines, Notebooks, SQL Scripts

5.1. Add/Modify Pipelines & notebooks.

5.2. Click Commit.
 

 

Step 6: Perform Pull Request in Synapse Workspace

6.1. From the Synapse WS Source Control panel, select Create Pull Request.
 

 
6.2. It will take you to the GitHub.com portal. Click Create pull request.
 

 
6.3. Enter a subject and comment about the pull request. Verify the sequence/order (follow the arrow from right to left). Click Create pull request.
 

 

Step 7: Perform Merge with Main Branch

7.1. Click Merge pull request.
 

 
7.2. Click Confirm merge.
 

 
7.3. Confirm the merged files into the main branch.
 

 

Step 8: Perform Publish from Main branch

8.1. From the Main branch, click Publish.
 

 

Conclusion

Integrating Azure Synapse Analytics with GitHub repository not only streamlines the development and deployment processes but also enhances collaboration and version control capabilities. By following the outlined steps and best practices, organizations can ensure their data pipelines are managed efficiently, enabling them to make data-driven decisions with confidence and agility.
 

Thanks for reading! Does your company need help with Azure Synapse Analytics?

Are you looking for help in maximizing your Azure Synapse Analytics instance? Learn more about our Synapse Analytics consulting services here.
 

Keep Your Data Analytics Knowledge Sharp

Get the latest Key2 content and more delivered right to your inbox!
 

 
 

Related Content
Exploring Our End-to-End Custom Azure Solution – Part 2

Exploring Our End-to-End Custom Azure Solution – Part 2

By: Mark Swiderski   In Part 1 of our series on this topic, Key2 Consulting discussed how electrical power generation data from the U.S. Energy Information Administration (EIA) was used to create a Power BI dashboard that pulled data from a dimensional model...

How to Connect Power BI to Serverless Azure Synapse Analytics

How to Connect Power BI to Serverless Azure Synapse Analytics

By: Mason Prewett, Dean Jurecic, and Mark Seaman Introduction Azure Synapse Analytics (Synapse) is a powerful tool that makes connecting to data in Azure Data Lake Storage Gen2 (ADLS) as easy as traditional data sources like SQL Server. This article outlines the...

How to Use Azure AI Language for Sentiment Analysis

How to Use Azure AI Language for Sentiment Analysis

By: Jay Clegg Intro - NLP & Sentiment Analysis Although research in Natural Language Processing (NLP) dates back many decades, recent advancements in both computing hardware and NLP architectures have produced incredibly useful results that have attracted intense...