Transferring data to Azure Blob

This guide explains how to transfer data to Azure Blob to store and further process information.

Concept

Azure Blob is an Active Destination. After you set Azure Blob as the Destination of a Datastream, data is transferred to Azure Blob each time data is fetched for the Datastream. For more information, see Destination types.

You can assign multiple Destinations to a Datastream. For more information on possible limitations, see Assigning multiple Destinations to a Datastream.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Ensure you have login details to the Destination with the following permissions:

    • Read, write, and delete files and folders.

    • List folders.

  • To connect to Azure Blob with a Service Principal account, ensure the Service Principal account has the roles Reader and Storage Blob Data Contributor. For more information on creating a Service Principal account, see the Azure Blob documentation. For more information on assigning roles, see the Azure Blob documentation.

Procedure

To transfer data from a Datastream to Azure Blob, follow these steps:

  1. Add Azure Blob as a Destination to the Workspace which contains the Datastream or to one of its parent Workspaces.

  2. Assign the Azure Blob Destination to the Datastream.

  3. Configure transfer settings.

Adding Azure Blob as a Destination

To add Azure Blob as a Destination to a Workspace, follow these steps:

  1. Click the Transfer element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Click + Add.

  2. Click Azure Blob.

  1. Click Setup a new Authorization.

  2. Click Next.

  1. Select one of the following options:

    • To connect to Azure Blob with the account name and access key, click File Azure. For more information on managing access keys, see the Azure Blob documentation.

    • To connect to Azure Blob with an SAS account, click File Azure (SAS). For more information on creating an SAS account, see the Azure Blob documentation.

    • To connect to Azure Blob with a Service Principal account, follow these steps:

      1. Click File Azure (Service Principal).

      2. In Client id, enter the application ID. For more information on getting the application ID, see the Azure Blob documentation.

      3. In Client secret, enter the application secret. For more information on creating an application secret, see the Azure Blob documentation.

      4. In Tenant id, enter the tenant ID. For more information on getting the tenant ID, see the Azure Blob documentation.

      5. In Storage Account Name, enter the name of the storage account. Do not include the part blob.core.windows.net.

      6. Click Authorize.

  1. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the Destination.

    Destination URL

    In the drop-down on the left, select the file server type. In the text field in the middle, enter the base URL of the file server. In the text field on the right, enter the path to the folder where to transfer data. Click Test to check the Authorization.

    Output format

    Select the data format that Adverity uses to transfer data to the Destination.

    When you transfer data to AVRO file format, select AVRO to use the null codec, or AVRO (deflate) to use the deflate codec. For more information on codecs, see the Apache documentation.

    For more information on advanced configuration settings, see File Destination reference.

  1. Click Create.

Assigning Azure Blob as a Destination

To assign the Azure Blob Destination to a Datastream, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, click + Add Destination.

  2. Click Assign Existing Destinations.

  1. Select the Azure Blob checkbox in the list.

  2. Click Save.

Configuring transfer settings

To configure transfer settings, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, find the Azure Blob Destination in the list, and click on the right.

  2. Click Destination Settings.

  1. Fill in the following fields:

    Filename

    Specify the target file in the Destination where to transfer data from the Datastream. The name can contain alphanumeric characters and underscores. For example, target_file.

    To transfer data to sub-folders within the folder defined in the Destination URL field, specify a file path. For example, folder1/target_file.

    By default, Adverity saves data from each Datastream in a different file named {datastream_type}_{datastream_id}_{scheduled_year}_{scheduled_month}_{scheduled_day}.

    You can specify the same target file for several Datastreams. If a column is shared between Datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between Datastreams, Adverity writes null values in the relevant cells.

    Use placeholders to create unique, dynamic file names in the Destination. Use the following placeholders:

    Placeholder

    Description

    {app_label}

    The Datastream Type's short name.

    {datastream_id}

    The Datastream ID.

    {datastream_type}

    The Datastream Type.

    {extension}

    The file extension of the Data Extract.

    {extract_id}

    The Data Extract ID.

    {id}

    The Datastream ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the file name. For example, {meta[datastream_URI]} uses the Datastream URI as the file name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the Data Extract.

    {scheduled_day}

    The day from the start date of a date range for a scheduled data fetch.

    {scheduled_month}

    The month from the start date of a date range for a scheduled data fetch.

    {scheduled_year}

    The year from the start date of a date range for a scheduled data fetch.

    {upload_day}

    The day when the Data Extract is transferred to the Azure Blob Destination.

    {upload_hour}

    The hour when the Data Extract is transferred to the Azure Blob Destination.

    {upload_minute}

    The minute when the Data Extract is transferred to the Azure Blob Destination.

    {upload_month}

    The month when the Data Extract is transferred to the Azure Blob Destination.

    {upload_second}

    The second when the Data Extract is transferred to the Azure Blob Destination.

    {upload_year}

    The year when the Data Extract is transferred to the Azure Blob Destination.

  2. Click Save.