Setting up Azure DevOps for your Data Factory – Part 2/3 – Develop something

This is the second post in how to set up Azure DevOps for your and shows how you should do development. For this example we will do the simple task of copying some data from a public data source into Azure Data Lake Storage Gen2. If you have not done so, read my post on setting up the environment

Create branch
The first step is to open up your development Data Factory and create a new branch. My recommendation is to create a new branch when you want to develop something that should be deployed together.

Creating a new branch menu in Data Factory

Create linked service
The next step is to create a linked service. A few things to note here; this has a name that is not environment specific (I’m not calling it ADLS_datahelgeadls2dev) and it is using Managed Identity to authenticate. The storage is set up with a role assignment for the dev Data Factory so that only development have access to this storage account. This makes it impossible to run something in test that access development data.

Deployment friendly linked service

Next we add two Key Vaults; our shared Key Vault and the development Key Vault. We won’t use these in this case as we have managed service identity authentication. And finally a linked service for our source which will be the Brønnøysund Register Centre which stores data on all legal entities in Norway. This is available without authentication, but would have used a secret from the shared Key Vault if needed, because we always use the same sources regardless of environments.

Create datasets and pipeline
For this demo I create two datasets; one for source and one for target, and a simple pipeline that copies the data. Datasets have name that point to the data lake, like ADLS_datahelgeadls2_Brreg_MainUnits, but does not include environment information.

Create pull request and publish
When you are done with development we come to the next steps to prepare for deployment. The first step is to run a pull request, from the same menu where you created the branch. You must add a title, should add a description and can add reviewers that will look at what you did.

Add some details to your pull request so that others can understand what you did.

Next the Pull Request (hopefully) will be approved, and the source branch merged with the master branch. You can choose to delete the source branch or not.

Complete pull request

Now there is just one more step before we can go further to start building our release pipeline, and that is to publish our data factory. If you go into branches in Azure Devops you fill see two branches; master and adf_publish. If I open adf_publish after my pull request I will find two ARM templates in the PartialArmTemplate folder. These will hold my two linked services (as they are published independently from the branches), like this example:

Partial ARM template in adf_publish

When doing publishing this will appear:

After this is done the contents of adf_publish is:

The adf_publish branch after publishing

Now we have developed something and put this into the adf_publish branch, and we are ready to go on to the final step; building our release pipeline and move stuff from development to test!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.