Azure Data Factory provides you with the ability to orchestrate the flow of data via activities in one more more pipelines. Some activities can be long-running or asynchronous in nature, and require you to either poll, or listen for their completion. The good news is that Azure Data Factory provides you with multiple capabilities for asynchronous activities, including polling, event-based processing, and Webhooks. Let’s take a closer look at Webhooks.
If you have a service that performs some action in batch, for example media rendering or encoding, infrastructure provisioning, or just a job that just runs asynchronously, you will need to know when it is completed. In an ideal world, you are not periodically polling to determine the status of a long-running or asynchronous job. Instead, you initiate the job, and then subscribe to it listening for confirmation of success, or failure. Enter the Webhook.
Webhooks are not exactly a web-standard in the sense that there is a fixed W3C standard that needs to be followed. It is simply a method of specifying a HTTP callback. A popular example of this are integrations with GitHub. For a more abstract example, take a look at the following:
I call my mechanic to let them know I am bringing my car in for service. I then drop my car off leaving my phone number with the mechanic, and I go about my business. The mechanic does their work, and then calls me when my car is ready. When I get the call, I then go to pickup my car.
Here’s a simple visualization of how this would work. Think about how inefficient it would be if I could not do this and had to periodically call the mechanic to find out the status of my car.
Data Factory Webhook Activity
Azure Data Factory has a native activity for subscribing via Webhook. Using the abstract above as an example, you would specify the subscription URL of the “Mechanic” (this is typically a POST) and in the body any headers, or parameters required.
At the time of invocation of this activity in a pipeline, Data Factory will add an additional field to the JSON body of the request, “callBackUri”, which will be automatically created. The activity will wait for a callback from the “Mechanic” until the timeout value specified (default is ten minutes). From there you can choose to continue executing in the pipeline, or use Data Factory’s control flows to gracefully handle a failure, or timeout.
Let’s take a look at what this looks like using a simple test.
The Webhook activity configuration requires a URL endpoint, any headers and body content required to subscribe, and a timeout setting. As mentioned above, Data Factory will automatically provide the callBackUri field in the body for you.
If you want to test this on your own, you need to setup a listener (the “Mechanic” using the abstract above), and trigger a run in Data Factory. There are a number of free Webhook tools available online, for this test I used an Azure Logic App with HTTP Listener to show the JSON containing the callBackUri received from Data Factory, see the following:
Fully rounding this out, I would store the callBackUri, kickoff my long-running or asynchronous process, and then call back to Data Factory using the callBackUri with the result. It’s simple to test this with a Logic App, Function, or any application that can listen for HTTP requests and call back to its subscribers.
If I needed additional customization, for example specifying another field name that contains the callback URL, complex logic or processing, etc. You can also invoke Azure Functions from Data Factory via Web Activity, and subscribe to long-running jobs that way.
This is just a simple overview of Webhooks and how you can easily integrate them into your Azure Data Factory pipelines. The capability is powerful, allowing you to integrate any long-running or asynchronous process with callback or Webhook capability into your Data Factory pipelines.