Setting up Azure DevOps for your Data Factory – Part 3/3 – Create release pipeline

Now we have set up our environments and we have developed something that we want to release to test. This final post describes how to do that, and a few useful tips as well.

Release process
The release process will have these steps;
1. Stop any active triggers. We do not want any pipelines to start as we are changing things (and you should wait until running pipelines finish before publishing)
2. Release from development to target environment
3. Clean up target environment by removing objects that are not present in dev. Also start triggers

Release pipelines
The release process will be handled with an Azure DevOps release pipeline. Azure DevOps can also create Build pipelines, but this is not necessary for Data Factory. To create a pipeline simply go into Azure DevOps, select pipelines and release and create a new pipeline. I select to create one from an empty job, and create a step called “Release to test”

Creating a new release pipeline

Artifacts
You need to add an artifact. This is where your code is placed, and should be your Azure DevOps repository and the adf_publish branch, like this:

Artifact for release – adf_publish branch

Variables
The next recommended step is to add a variable for the target environment by using the variables tab like this:

Add a variable for the target environment

Pause triggers
Before deploying the artifact (Data Factory) to test we need to stop triggers. To do this (and do some cleanup later) we use a PowerShell script that you can download from Microsoft on this page under sample script. IMPORTANT: currently you must to a small edit to comment out this; rows under #linkedservices comment and from Write-Host “delete integration services.” That part of the code will not work (an error in Az or Data Factory?)

The script needs a few settings as shown below. It is also important to set the task version to 4.* as this is required for Az modules.

PowerShell task to pause triggers

Azure deployment
Next you add a new task for Azure Resource Group Deployment. Here you add the target resource group (for me this is datahelge-$(targetEnvironment)-rg). Next you add the template and template parameters files from your linked artifact. Then you press the three dots button next to the “Override pipeline parameters” box. This is where the magic happens; here you change the name for the Factory, but also most importantly the connection strings. In this case I change from “dev” to “$(targetEnvironment)” for the data lake storage and environment specific Key Vault. The Shared Key Vault and source have no change.

Override template parameters to change environment

Next is deployment mode. This is set to incremental. THIS MUST ALWAYS BE INCREMENTAL. Why? Because a “Complete” deployment wall will delete EVERYTHING in your resource group. Imagine you have a Data Lake, Data Factory, Key Vault, etc. set up. Everything will be gone. I learned this from experience. Luckily I had a PowerShell script to do environment setup for me (as you should), but I added some caps so you don’t have to do that.

Resume triggers
The final thing you do is clone the PowerShell task and change the script arguments a bit. In case you haven’t studied the PowerShell script this will use the ARM template and remove all resources that are not part of the ARM template from your target environment. So you know that quick fix you did in production to add a new pipeline with a few datasets and activities? That will be gone if haven’t done it in development as well.

Script arguments for resume triggers

Test!
That’s it. Now you should be able to create a release, and afterwards you can verify the results by looking at the connection strings for your data factory. As you can see below this works! The test data factory uses the same linked service name as in dev, but points to a different data lake storage account.

URL in dev (right) and test (left) – same name for linked service, but different URL

One last thing for you Databrickers…
Are you using Databricks? Then you typically would want different setups for dev and test, with different workspaces. Here is what you do:
– Add the token for Databricks access into the environment Key Vault.
Open this page and search for Databricks. Add the script below into your master branch. This will give you one more template parameter which is the Databricks cluster id (id, not name).

This concludes my three post guide to Azure DevOps for Data Factory. I haven’t covered absolutely everything here, but hopefully it can help you get on your way.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.