Calling Azure Functions from Data Factory with authentication

Azure Functions is a great way to do the things Data Factory can’t. In this example I want to use it to get a Oauth token from Strava, and I want all my secret stuff to be stored in Azure Key Vault. Data Factory can’t lookup values in the Key Vault and build a header for me (as far as I know), but there is a way for Azure Functions to do this (and that will be my next blog post).

So, this should be simple. You have an Azure Functions activity in Data Factory, documented here. By default all Azure Functions is secured with a master key, and I have put this into Key Vault to configure my Function linked service like this (here is a description of linking data factory to key vault):

Azure Function linked service – secured with master key

But is this secure enough? If someone gets your master code they can call the function, without authentication. One way to avoid this is to limit access to the function to certain IPs, but that would require you to set up an integration runtime on a machine or VM with a fixed IP and calling the Function through that. The option I went for was to secure the app by requiring Azure AD authentication.

But then I had the next problem. The Azure Function linked service doesn’t seem to support calling functions with autentication! So, then I had to explore other options. What I ended up with was the REST linked service. Because this can actually have Azure AD authentication, and that with a service principal with secret stored in Key Vault

REST linked service – with AAD Service Principal authentication

Next; you cannot use REST as a lookup to just get the token value. But you an use REST as copy. So, I created a copy activity to store the value in a file in my data lake. Next I lookup this file, and set the check for secure output on the lookup (found in the General tab). This makes the result like this is monitor:

Nothing to see here, this output have been secured

Similarly I checked the secure input in my next copy activity, to avoid it from showing in monitor or logs. Then I did this as the source definition:

Even though the output is secure you can reference the values in dataset variables

This made it possible for me to use the token that I got from my secured Azure Function without any secure values showing in monitor or code. Everything is secured with Azure Key Vault. But what do you do with the file stored in data lake with the token in clear text? That is deleted with a delete activity, so it is only there for a few seconds.

My reason for doing this and returning the token that is everything needed to read Strava data is that I now can use data factory for all the different types of API calls, without having to implement everything in my Azure Function. It might be possible to do this in a easier way, but at least I have an Azure Function where calls are authenticated, and the token is not stored unless for a few seconds.

3 comments on “Calling Azure Functions from Data Factory with authentication

  1. Wtf Helge, you are storing a token in plain text in a storage instead of storing it in a KeyVault ? That´s way less secure.

    Thanks for sharing nevertheless, it´s a clever trick

    1. I agree with you, I would have loved to be able to store this in Key Vault. But I don’t think there is a way to this. In this case I need to refresh the token, and use a function for that. But then I need to use this token again to connect to REST datasets. The REST connector does not support adding in a token from KeyVault, that would have been the best.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.