Loading data into Snowflake

This guide explains how to load data into Snowflake for further analysis.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Create a datastream whose data you want to load into Snowflake. For more information on creating a datastream, see Creating a datastream.

  • To set up an authorization to Snowflake using OAuth2 integration:

    • Ensure the account you use to connect to Snowflake has a SYSADMIN or higher system-defined role, or a custom role with the same or higher privileges. For more information on Snowflake roles, see the Snowflake documentation.

      When working with Snowflake roles, make sure that the role is all in uppercase, for example USER_ROLE. If a role is not in uppercase, this can cause errors.

    • Create Snowflake security integration to get the Client ID and Client secret values with the following expression:

      CREATE SECURITY INTEGRATION {YOUR_INTEGRATION_NAME}
      TYPE = OAUTH
      ENABLED = TRUE
      OAUTH_CLIENT = CUSTOM
      OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
      OAUTH_REDIRECT_URI = 'https://oap.datatap.io/oauth2/callback/'
      OAUTH_ISSUE_REFRESH_TOKENS = TRUE
      BLOCKED_ROLES_LIST = ('SYSADMIN')
      ;

      For more information on using OAuth2 with Snowflake, see the Snowflake documentation.

  • To create tables in the Snowflake destination, ensure the account you use to connect to Snowflake has the role USAGE or a role with the same or higher privileges.

Procedure

To load data from a datastream into Snowflake, follow these steps:

  1. Add Snowflake as a destination to the workspace which contains the datastream or to one of its parent workspaces.

  2. Assign the Snowflake destination to the datastream.

    You can assign as many destinations to a datastream as you want.

    Some destinations require specific Data Mapping, such as Hubspot and Facebook Offline Conversions. If these Data Mapping requirements conflict, the destinations cannot be assigned to the same datastream.

  3. Configure load settings.

Adding Snowflake as a destination

To add Snowflake as a destination to a workspace, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Destinations.

  2. Click + Add Destination.

  3. Search for and click Snowflake.

  1. Choose how to authorize Adverity to access Snowflake:

    • To use your details, click Access Snowflake using your credentials.

    • To ask someone else to use their details, click Access Snowflake using someone else's credentials.

      If you choose this option, the person you ask to create the authorization will need to go through the following steps.

  2. Click Next.

  1. Select one of the following options. The information you need to provide in the next step depends on the option you choose:

    • To access Snowflake using your username and password, click Snowflake.

    • To access Snowflake using your username and private key, click Snowflake (Key Pair).

    • To access Snowflake using OAuth2 authentication, click Snowflake (OAuth2).

  1. In the authorization page, fill in the following fields:

    Hostname - all Snowflake authorization methods

    The server name of the destination database. Use the following format for the hostname:

    {account_name}.{region_id}.snowflakecomputing.com

    If your region is US-WEST, region_id is not required:

    {account_name}.snowflakecomputing.com

    Username - Snowflake and Snowflake (Key Pair)

    The username of the Snowflake account.

    Password - Snowflake

    The password of the Snowflake account.

    Private Key - Snowflake (Key Pair)

    The Private Key associated with one of the public keys assigned to the Snowflake account. The Private Key must be PEM encoded and include boundaries, for example: (-----BEGIN[Private Key]-----END).

    Key password - Snowflake (Key Pair)

    The password to decrypt the Private Key. Leave empty if the Private Key is not encrypted.

    Client id - Snowflake (OAuth2)

    The client ID of the Snowflake account. For more information on finding the client ID, see the Snowflake documentation.

    Client secret - Snowflake (OAuth2)

    The client secret of the Snowflake account. For more information on finding the client secret, see the Snowflake documentation.

  2. Click Authorize.

  3. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the destination.

    Warehouse

    Specify the name of the warehouse into which you want to load the data.

    Database

    Specify the name of the database into which you want to load the data.

    For more information on advanced configuration settings, see Advanced Snowflake tips.

  1. Click Create.

Assigning Snowflake as a destination

To assign the Snowflake destination to a datastream, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Datastreams.

  2. Open the chosen datastream by clicking on its name.

  1. In the External destinations section, click + Assign destination.

  2. Click Assign existing destination.

  1. Select the Snowflake checkbox in the list.

  2. Click Save.

Configuring settings for loading data into Snowflake

To configure the settings for loading data into Snowflake, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Datastreams.

  2. Open the chosen datastream by clicking on its name.

  1. In the Destinations section, find the Snowflake destination in the list, and click on the right.

  2. Click Destination settings.

  1. Fill in the following fields:

    Table name

    Specify the target table in the destination into which to load data from the datastream. The name can contain alphanumeric characters and underscores. For example, target_table. To specify a schema, use the syntax schemaName.tableName.

    By default, Adverity saves data from each datastream in a different table named {datastream_type}_{datastream_id} (for example, mailgun_83).

    You can specify the same target table for several datastreams. If a column is shared between datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between datastreams, Adverity writes null values in the relevant cells.

    • To create a new Snowflake spreadsheet containing the data you load into Snowflake, enter a name for the new spreadsheet into this field.

    You can use the following placeholders when creating new table names in the destination:

    Placeholder

    Description

    {datastream_id}

    The datastream ID.

    {datastream_type}

    The data source.

    {extract_id}

    The data extract ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the table name. For example, {meta[datastream_URI]} uses the datastream URI as the table name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the data extract.

    {scheduled_day}

    The day when the data fetch was scheduled to run.

    {scheduled_month}

    The month when the data fetch was scheduled to run.

    {scheduled_year}

    The year when the data fetch was scheduled to run.

    Truncate

    Select this checkbox to delete all rows from the relevant table in the destination before loading the latest data extract.

    If the Truncate checkbox is selected, the following overwrite conditions are not applied and all data will be deleted.

    If no overwrite options are selected, the loaded data is appended to the destination table.

    Datastream

    Select this checkbox to overwrite all data previously loaded into the destination from this datastream.

    Data loaded from other datastreams is not overwritten.

    Date Range

    Select a date column from your data extract to overwrite all data in the destination with a date that matches a date in this column.

    If the Datastream checkbox is selected, only the data loaded from the current datastream with matching dates will be overwritten.

    Filename

    Select this checkbox to overwrite the relevant data in the destination if a data extract with the same filename already exists in the destination.

    Key Columns

    Select this checkbox to overwrite data in the destination based on the key columns defined in the Data Mapping of the datastream. Adverity executes this overwrite option after all the other overwrite options. When you select this checkbox, the configuration to overwrite data based on dates does not have an effect.

  2. Click Save.

Advanced Snowflake tips

Configuration

To configure the Snowflake destination, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Destinations.

  2. Click the Snowflake destination in the list.

  1. In the secondary menu, click Configuration.

  2. Fill in the following fields:

    Load state table

    Select the checkbox to create a load state table. The load state table is a table in the destination which contains information about each data load. The load state table contains the following fields:

    Fieldname

    Description

    datastream_id

    datastream ID.

    datastream_name

    datastream name.

    datastream_url

    datastream URL.

    extract_filename

    Filename of the data extract.

    range_start

    Start date and time of the fetch.

    range_end

    End date and time of the fetch.

    uuid

    Unique ID of the data transfer to the target table.

    load_state

    State of the data transfer. The possible states are failed, finished, and running.

    load_start

    Date and time when the data transfer started.

    load_end

    Date and time when the data transfer finished.

    target_schema

    Data schema used in the target table.

    target_table

    Name of the target table.

    nrows

    Number of rows in the target table.

    ncols

    Number of columns in the target table.

    Name of load state table

    Enter a name for the load state table. The default name of a load state table is dt_loadstate.

    Role

    Specify the Snowflake user role. User roles will appear in this drop-down menu when a role is assigned to a user in Snowflake. If left empty, the role is set based on the Snowflake enforcement model.

    Schema

    Specify the schema used in the database.

  3. Click Save.

Technical details of loading data into Snowflake

When loading a data extract into Snowflake, Adverity creates the following additional columns in the target table:

dt_created

The date the data extract was created.

dt_updated

The date the data extract was last updated.

dt_filename

The automatically generated filename of the data extract.

In addition, Adverity creates a temporary table in Snowflake with the name _stage. This ensures that no partial updates are made to the database (guarantee of atomicity). Adverity deletes this temporary table after the data load is complete or has been aborted.

Loading data in bulk into Snowflake

You can load data in bulk into Snowflake if you have one of the following storage methods configured in your workspace:

  • Amazon S3

  • Azure Blob

  • Google Cloud Storage

To load data in bulk into Snowflake, you need a master key that allows Adverity to access the database and perform the bulk load. For more information on how to find the master key for your Snowflake, see the Snowflake documentation.

Enter the master key into Adverity when you set up or edit the storage authorization. For more information how to set up or edit your storage, see Setting up authorizations to store data extracts

Loading data in bulk into Snowflake with Google Cloud Storage

To load data in bulk into Snowflake when using Google Cloud Storage, you need to configure an integration in Snowflake. For more information, see the Snowflake documentation.

After configuring the integration in Snowflake, add it to the storage configuration in Adverity. To add the integration, follow these steps:

  1. Go to https://{HOSTNAME}/core/datalake/.

  2. Click on the Google Cloud Storage storage name to open the storage configuration.

  3. Scroll down to the Advanced settings section.

  4. In the Parameters text box, add the following line to specify the integration: snowflake_storage_integration = {integration_name}.

Troubleshooting: Loading data into Snowflake fails after changing the table structure of the data extract

Data may fail to load into Snowflake because of a change in the table structure of the data extract. To ensure the frictionless operation of production systems, Adverity only modifies existing tables in Snowflake if the changes are considered neutral, such as adding a new field to the table. Changes to the table structure of the data extract are not considered neutral. As a consequence, Adverity does not update the existing table in Snowflake resulting in failure when loading data.

The following actions change the table structure of the data extract:

  • You change the key columns in the datastream's Data Mapping. For more information on key columns, see Setting key columns.

  • You change the data type of a column in the datastream's Data Mapping. For more information on the data types used in Adverity, see Data types used in data harmonization.

  • You change the maximum character length of values in a column which contains String values.

To resolve this problem, follow these steps:

  1. In Snowflake, modify or drop the table that corresponds to the datastream.

  2. In Adverity, load data from the datastream into Snowflake.

Troubleshooting: Adverity cannot access Snowflake destination

If Adverity cannot access the Snowflake destination, the destination may restrict access based on IP address.

To resolve this issue, follow these steps:

  1. Find the IP address of your Adverity instance. To find the IP address of your Adverity instance, contact Adverity Customer Support.

  2. Add Adverity's IP address to the whitelist of your destination.