Loading data into Snowflake

Loading data into Snowflake#

This guide explains how to load data into Snowflake for further analysis.

Prerequisites#

Before you complete the procedure in this guide, perform all of the following actions:

Create a datastream whose data you want to load into Snowflake. For more information on creating a datastream, see Collecting data in Adverity.
Apply the correct Data Mapping to your datastream.
To create tables in the Snowflake destination, ensure the account you use to connect to Snowflake has the role USAGE or a role with the same or higher privileges.
If you choose to use OAuth2 authorization, follow these steps. This is not necessary if you authorize Snowflake using your username and password or private key.
- Snowflake automatically uses your default role to create the authorization.
  
  Ensure that the Snowflake role that you want to use to connect to Snowflake is set as your default role before you create an OAuth2 authorization.
  
  Note
  
  When working with Snowflake roles, make sure that the role is all in uppercase, for example USER_ROLE. If a role is not in uppercase, this can cause errors.
- Create Snowflake security integration to get the Client ID and Client secret values with the following expression. This requires a role with ACCOUNTADMIN or higher privileges.
```
CREATE SECURITY INTEGRATION {YOUR_INTEGRATION_NAME}
TYPE = OAUTH
ENABLED = TRUE
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = 'https://oap.datatap.io/oauth2/callback/'
OAUTH_ISSUE_REFRESH_TOKENS = TRUE
BLOCKED_ROLES_LIST = ('SYSADMIN')
;
```
  For more information on using OAuth2 with Snowflake, see the Snowflake documentation.

Procedure#

To load data from a datastream into Snowflake, follow these steps:

Add Snowflake as a destination to the workspace which contains the datastream or to one of its parent workspaces.
Assign the Snowflake destination to the datastream.

You can assign as many destinations to a datastream as you want.

Some destinations require specific Data Mapping, such as Hubspot. If these Data Mapping requirements conflict, the destinations cannot be assigned to the same datastream.
Configure load settings.

Adding Snowflake as a destination#

To add Snowflake as a destination to a workspace, follow these steps:

Go to the Destinations page.
Click + Create destination.
Search for and click Snowflake.
Choose how to authorize Adverity to access Snowflake:
- To use your details, click Access Snowflake using your credentials.
- To ask someone else to use their details, click Access Snowflake using someone else’s credentials.
  
  If you choose this option, the person you ask to create the authorization will need to go through the following steps.
Click Next.
Select one of the following options. The information you need to provide in the next step depends on the option you choose:
- To access Snowflake using your username and password, click Snowflake.
- To access Snowflake using your username and private key, click Snowflake (Key Pair).
- To access Snowflake using OAuth2 authentication, click Snowflake (OAuth2).
In the authorization page, fill in the following fields:
Hostname - all Snowflake authorization methods
The server name of the destination database. Use the following format for the hostname:
{account_name}.{region_id}.snowflakecomputing.com
If your region is US-WEST, region_id is not required:
{account_name}.snowflakecomputing.com
Username - Snowflake and Snowflake (Key Pair)
The username of the Snowflakeaccount.

Password - Snowflake
The password of the Snowflakeaccount.

Private Key - Snowflake (Key Pair)
The Private Key associated with one of the public keys assigned to the Snowflake account. The Private Key must be PEM encoded and include boundaries, for example: (-----BEGIN [Private Key] -----END).

Key password - Snowflake (Key Pair)
The password to decrypt the Private Key. Leave empty if the Private Key is not encrypted.

Client id - Snowflake (OAuth2)
The client ID of the Snowflake account. For more information on finding the client ID, see the Snowflake documentation.

Client secret - Snowflake (OAuth2)
The client secret of the Snowflake account. For more information on finding the client secret, see the Snowflake documentation.
Click Authorize.
In the Configuration page, fill in the following fields:

Name
(Optional) Rename the destination.

Warehouse
Specify the name of the warehouse into which you want to load the data.

Database
Specify the name of the database into which you want to load the data.

For more information on advanced configuration settings, see Advanced Snowflake tips.
Click Create.

Assigning Snowflake as a destination#

To assign the Snowflake destination to a datastream, follow these steps:

Go to the Datastreams page.
Open the chosen datastream by clicking on its name.
In the Load section, click + Add destination.
Select the Snowflake checkbox in the list.
Click Save.
For the automatically enabled destinations, in the pop-up window, click Yes, load data if you want to automatically load your previously collected data into the new destination. The following data extracts will be loaded:
- All data extracts with the status collected if no other destinations are enabled for the datastream
- All data extracts with the status loaded if the data extracts have already been sent to Adverity Data Storage or external destinations
Alternatively, click Skip to continue configuring the destination settings or re-load the data extracts manually. For more information, see Re-loading a data extract.

Configuring settings for loading data into Snowflake#

To configure the settings for loading data into Snowflake, follow these steps:

Go to the Datastreams page.
Open the chosen datastream by clicking on its name.
In the Load section, find the Snowflake destination in the list, and click Actions on the right.
Click Destination settings.

Fill in the following fields:

Table name

Specify the target table in the destination into which to load data from the datastream. The name can contain alphanumeric characters and underscores. For example, target_table. To specify a schema, use the syntax schemaName.tableName.

By default, Adverity saves data from each datastream in a different table named {datastream_type}_{datastream_id} (for example, mailgun_83).

You can specify the same target table for several datastreams. If a column is shared between datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between datastreams, Adverity writes null values in the relevant cells.

To create a new Snowflake spreadsheet containing the data you load into Snowflake, enter a name for the new spreadsheet into this field.

You can use the following placeholders when creating new table names in the destination:

Placeholder	Description
{datastream_id}	The datastream ID.
{datastream_type}	The data source.
{extract_id}	The data extract ID.
{meta[*]}	Replace `*` with a metadata placeholder to use metadata in the table name. For example, `{meta[datastream_display_name]}` uses the datastream name as the table name. For more information on metadata and placeholders, see Using placeholders to name objects. Warning To use the datastream name as a table name, the datastream name must be in uppercase and use only underscores as separators.
{scheduled_day}	The day when the data fetch was scheduled to run.
{scheduled_month}	The month when the data fetch was scheduled to run.
{scheduled_year}	The year when the data fetch was scheduled to run.

Truncate

Select this checkbox to delete all rows from the relevant table in the destination before loading the latest data extract.

If the Truncate checkbox is selected, the following overwrite conditions are not applied and all data will be deleted.

If no overwrite options are selected, the loaded data is appended to the destination table.

Datastream

Select this checkbox to overwrite all data previously loaded into the destination from this datastream.

Data loaded from other datastreams is not overwritten.

Date Range

Select a date column from your data extract to overwrite all data in the destination with a date that matches a date in this column.

If the Datastream checkbox is selected, only the data loaded from the current datastream with matching dates will be overwritten.

Filename

Select this checkbox to overwrite the relevant data in the destination if a data extract with the same filename already exists in the destination.

Key Columns

Select this checkbox to overwrite data in the destination based on the key columns defined in the Data Mapping of the datastream. Adverity executes this overwrite option after all the other overwrite options. When you select this checkbox, the configuration to overwrite data based on dates does not have an effect.

Click Save.

Advanced Snowflake tips#

Configuration#

To configure the Snowflake destination, follow these steps:

Go to the Destinations page.
Click the Snowflake destination in the list.
In the secondary menu, click Configuration.

Fill in the following fields:

Load state table

Select the checkbox to create a load state table. The load state table is a table in the destination which contains information about each data load. The load state table contains the following fields:

Fieldname	Description
datastream_id	datastream ID.
datastream_name	datastream name.
datastream_url	datastream URL.
extract_filename	Filename of the data extract.
range_start	Start date and time of the fetch.
range_end	End date and time of the fetch.
uuid	Unique ID of the data transfer to the target table.
load_state	State of the data transfer. The possible states are failed, finished, and running.
load_start	Date and time when the data transfer started.
load_end	Date and time when the data transfer finished.
target_schema	Data schema used in the target table.
target_table	Name of the target table.
nrows	Number of rows in the target table.
ncols	Number of columns in the target table.

Name of load state table

Enter a name for the load state table. The default name of a load state table is dt_loadstate.

Role

Specify the Snowflake user role. User roles will appear in this drop-down menu when a role is assigned to a user in Snowflake. If left empty, the role is set based on the Snowflake enforcement model.

Schema

Specify the schema used in the database.

Click Save.

Technical details of loading data into Snowflake#

When loading a data extract into Snowflake, Adverity creates the following additional columns in the target table:

dt_created: The date the data extract was created.
dt_updated: The date the data extract was last updated.
dt_filename: The automatically generated filename of the data extract.

In addition, Adverity creates a temporary table in Snowflake with the name _stage. This ensures that no partial updates are made to the database (guarantee of atomicity). Adverity deletes this temporary table after the data load is complete or has been aborted.

Loading data in bulk into Snowflake#

You can load data in bulk into Snowflake if you have one of the following storage methods configured in your workspace:

Amazon S3
Azure Blob
Google Cloud Storage

To load data in bulk into Snowflake, you need a master key that allows Adverity to access the database and perform the bulk load. For more information on how to find the master key for your Snowflake, see the Snowflake documentation.

Enter the master key into Adverity when you set up or edit the storage authorization. For more information how to set up or edit your storage, see Setting up authorizations to store data extracts.

Loading data in bulk into Snowflake with Google Cloud Storage#

To load data in bulk into Snowflake when using Google Cloud Storage, you need to configure an integration in Snowflake. For more information, see the Snowflake documentation.

After configuring the integration in Snowflake, add it to the storage configuration in Adverity. To add the integration, follow these steps:

Go to https://{HOSTNAME}/core/datalake/.
Click on the Google Cloud Storage storage name to open the storage configuration.
Scroll down to the Advanced settings section.
In the Parameters text box, add the following line to specify the integration:
```
snowflake_storage_integration ={integration_name}
```

Troubleshooting: Loading data into Snowflake fails after changing the table structure of the data extract#

Data may fail to load into Snowflake because of a change in the table structure of the data extract. To ensure the frictionless operation of production systems, Adverity only modifies existing tables in Snowflake if the changes are considered neutral, such as adding a new field to the table. Changes to the table structure of the data extract are not considered neutral. As a consequence, Adverity does not update the existing table in Snowflake resulting in failure when loading data.

The following actions change the table structure of the data extract:

You change the key columns in the datastream’s Data Mapping. For more information on key columns, see Setting key columns.
You change the data type of a column in the datastream’s Data Mapping. For more information on the data types used in Adverity, see Data types used in data harmonization.
You change the maximum character length of values in a column which contains String values.

To resolve this problem, follow these steps:

In Snowflake, modify or drop the table that corresponds to the datastream.
In Adverity, load data from the datastream into Snowflake.

Troubleshooting: Adverity cannot access Snowflake destination#

If Adverity cannot access the Snowflake destination, the destination may restrict access based on IP address.

To resolve this issue, follow these steps:

Find the IP address of your Adverity instance. To find the IP address of your Adverity instance, contact Adverity Customer Support.
Add Adverity’s IP address to the whitelist of your destination.