The release process will have these steps;
1. Stop any active triggers. We do not want any pipelines to start as we are changing things (and you should wait until running pipelines finish before publishing)
2. Release from development to target environment
3. Clean up target environment by removing objects that are not present in dev. Also start triggers
The release process will be handled with an Azure DevOps release pipeline. Azure DevOps can also create Build pipelines, but this is not necessary for Data Factory. To create a pipeline simply go into Azure DevOps, select pipelines and release and create a new pipeline. I select to create one from an empty job, and create a step called “Release to test”
You need to add an artifact. This is where your code is placed, and should be your Azure DevOps repository and the adf_publish branch, like this:
The next recommended step is to add a variable for the target environment by using the variables tab like this:
Before deploying the artifact (Data Factory) to test we need to stop triggers. To do this (and do some cleanup later) we use a PowerShell script that you can download from Microsoft on this page under sample script. IMPORTANT: currently you must to a small edit to comment out this; rows under #linkedservices comment and from Write-Host “delete integration services.” That part of the code will not work (an error in Az or Data Factory?)
The script needs a few settings as shown below. It is also important to set the task version to 4.* as this is required for Az modules.
Next you add a new task for Azure Resource Group Deployment. Here you add the target resource group (for me this is datahelge-$(targetEnvironment)-rg). Next you add the template and template parameters files from your linked artifact. Then you press the three dots button next to the “Override pipeline parameters” box. This is where the magic happens; here you change the name for the Factory, but also most importantly the connection strings. In this case I change from “dev” to “$(targetEnvironment)” for the data lake storage and environment specific Key Vault. The Shared Key Vault and source have no change.
Next is deployment mode. This is set to incremental. THIS MUST ALWAYS BE INCREMENTAL. Why? Because a “Complete” deployment wall will delete EVERYTHING in your resource group. Imagine you have a Data Lake, Data Factory, Key Vault, etc. set up. Everything will be gone. I learned this from experience. Luckily I had a PowerShell script to do environment setup for me (as you should), but I added some caps so you don’t have to do that.
The final thing you do is clone the PowerShell task and change the script arguments a bit. In case you haven’t studied the PowerShell script this will use the ARM template and remove all resources that are not part of the ARM template from your target environment. So you know that quick fix you did in production to add a new pipeline with a few datasets and activities? That will be gone if haven’t done it in development as well.
That’s it. Now you should be able to create a release, and afterwards you can verify the results by looking at the connection strings for your data factory. As you can see below this works! The test data factory uses the same linked service name as in dev, but points to a different data lake storage account.
One last thing for you Databrickers…
Are you using Databricks? Then you typically would want different setups for dev and test, with different workspaces. Here is what you do:
– Add the token for Databricks access into the environment Key Vault.
– Open this page and search for Databricks. Add the script below into your master branch. This will give you one more template parameter which is the Databricks cluster id (id, not name).
This concludes my three post guide to Azure DevOps for Data Factory. I haven’t covered absolutely everything here, but hopefully it can help you get on your way.