Azure Data Factory: Orchestrating Data Pipelines

Azure Data Factory (ADF) is a cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation.

Key Components of ADF

Pipelines: A logical grouping of activities that perform a unit of work.
Activities: A processing step in a pipeline (e.g., Copy Data, Databricks Notebook).
Datasets: Represents data structures within the data stores.
Linked Services: Defines the connection information to external resources (like connection strings).
Integration Runtimes: The compute infrastructure used by ADF.

Creating a Simple Copy Pipeline

A common use case is copying data from an on-premises SQL Server to Azure Blob Storage.

Step 1: Create Linked Services

You’ll need a Linked Service for your source (SQL Server) and your sink (Blob Storage).

Step 2: Create Datasets

Define datasets that reference the Linked Services. For example, a table in SQL Server and a folder in Blob Storage.

Step 3: Create the Pipeline

Add a “Copy Data” activity to the pipeline. Configure the source and sink datasets.

{
    "name": "CopyFromSQLToBlob",
    "properties": {
        "activities": [
            {
                "name": "CopyData",
                "type": "Copy",
                "inputs": [ { "referenceName": "SourceSQLDataset", "type": "DatasetReference" } ],
                "outputs": [ { "referenceName": "SinkBlobDataset", "type": "DatasetReference" } ]
            }
        ]
    }
}

Triggers

Pipelines can be scheduled using triggers:

Schedule Trigger: Runs on a wall-clock schedule.
Tumbling Window Trigger: Operates on time slices.
Event-based Trigger: Responds to events like a file landing in Blob Storage.

ADF serves as the backbone for many modern data engineering platforms on Azure.

Azure Data Factory: Orchestrating Data Pipelines

Azure Data Factory: Orchestrating Data Pipelines

Key Components of ADF

Creating a Simple Copy Pipeline

Step 1: Create Linked Services

Step 2: Create Datasets

Step 3: Create the Pipeline

Triggers

Table of Contents

Azure Series