Connecting Microsoft Dataverse to Azure Databricks

How to Connect your Microsoft Dataverse Environment to an Azure Databricks Instance

I developed this step-by-step guide after tackling a recent client project that really threw me some curveballs. I ran into significant challenges and noticed there wasn’t much helpful documentation out there. So, I thought I’d share my approach to help anyone else who might be navigating these tricky waters.

Pre-requisite:
Create an Azure Databricks workspace, it should be created in the same region as your storage account and Power Apps environment.
This tutorial is for non-VNET injected workspaces.
Visit the official Microsoft documentation listed in the sources section below for steps and information for VNET injected workspaces.

  1. Setup an Azure Data Lake Storage Gen2 Account.
    1. The storage account must be in the same region as your Power Apps/Dataverse Environment
    2. Make sure the “hierarchical namespace” is enabled for the storage account.
    3. “Public Network Access” must be enabled for initial setup and delta sync.
  2. Create New Azure Synapse Link within your desired environment.
  3. Select tables you wish to export from the environment to your Data Lake.
    1. Note: Only tables with “Track Changes” enabled can be exported, they will not show up if this setting on the table is not enabled. If you do not see it as an option, navigate to your table’s properties in Power Apps and settings to turn it on.
  4. Connect your Azure Data Lake Storage to Azure Databricks through one of the following methods:
    1. Method 1: Unity Catalog (Recommended)
      1. Create an Access Connector for Azure Databricks.
      2. Grant the managed identity access to the storage account.
        1. You will need to provide the managed identity the Storage Blob Data Contributor and Storage Queue Data Contributor roles on the storage account.
    2. Method 2: OAuth 2.0 with an Azure Service Principal.
      1. Create an Azure service principal.
      2. Create a client secret for your service principal.
      3. Grant the service principal access to Azure Data Lake Storage Gen2.
      4. Add the client secret to Azure Key Vault.
      5. Create an Azure Key Vault-backed secret in your Azure Databricks workspace.
      6. Connect to Azure Storage using Python in a Databricks notebook.
    3. Method 3: SAS Token (Last Resort)
      1. From Microsoft’s Documentation:

        Shared access signatures (SAS) – You can use storage SAS tokens to access Azure storage. With SAS, you can restrict access to a storage account using temporary tokens with fine-grained access control.

        You can only grant a SAS token permissions that you have on the storage account, container, or file yourself. You can configure SAS tokens for multiple storage accounts in the same Spark session.

        Python:
        spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
        spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
        spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope>", key="<sas-token-key>"))

        Replace:

        • <storage-account> with the Azure Storage account name.
        • <scope> with the Azure Databricks secret scope name.
        • <sas-token-key> with the name of the key containing the Azure storage SAS token
  5. In order to read the CDM formatted data from Microsoft Dataverse, you have two options:
    1. Create a script to read through the model.json file accompanying your Dataverse data transmitted via Azure Synapse Link.
      • In order to denormalize the CDM “enumerated” or “lookup” values, you can add the “stringmaps” table to your Azure Synapse Link and use it to get the more descriptive, human-readable entries for those values.
    2. Utilize the Azure Spark CDM Connector for Synapse (at the time of writing, this connector option is not working to write data into Unity Catalog)

Sources (links below open in a new tab automatically):

Official Microsoft Documentation – Create an Azure Synapse Link for Dataverse with Azure Data Lake

Official Microsoft Documentation – Tutorial: Connect to Azure Data Lake Storage Gen2

Official Microsoft Documentation – Tutorial: Unity Catalog metastore admin tasks for Databricks SQL

Official Microsoft Documentation – Use Azure managed identities in Unity Catalog to access storage

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.