Transferring data to Google BigQuery

This guide explains how to transfer data to Google BigQuery to store and further process information.

Concept

Google BigQuery is an Active Destination. After you set Google BigQuery as the Destination of a Datastream, data is transferred to Google BigQuery each time data is fetched for the Datastream. For more information, see Destination types.

You can assign multiple Destinations to a Datastream. For more information on possible limitations, see Assigning multiple Destinations to a Datastream.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Create a Datastream whose data you want to transfer to Google BigQuery. For more information on creating a Datastream, see Introduction to collecting data.

  • Create a dataset in Google BigQuery that is dedicated to data transferred from Adverity.

  • Ensure that the Google account you use to connect to Google BigQuery includes the following permissions:

    • bigquery.jobs.get

    • bigquery.jobs.list

    • bigquery.jobs.create

    • bigquery.tables.get

    • bigquery.tables.getData

    • bigquery.tables.list

    • bigquery.datasets.get

    • bigquery.tables.create

    • bigquery.tables.update

    • bigquery.tables.updateData

    Alternatively, if your account does not include all of the permissions above, connect to Google BigQuery with a JSON service account key. For more information on creating a JSON service account key, see the Google documentation.

If you transfer large data sets to Google BigQuery (for example, larger than 50 GB), transfer the data as a batch operation. For more information on batch loading data to Google BigQuery, see the Google documentation. To use the batch load function, perform all of the following actions in addition to the prerequisites listed above:

  • Ensure that the Google account you use to connect to Google BigQuery includes the following permissions:

    • storage.objects.get

    • storage.objects.list

  • Ensure that the Google account you use to connect to Google BigQuery has permissions to write and delete files in your Google Cloud Storage.

  • Ensure that the account you use for Google BigQuery and Google Cloud Storage has access to both Google BigQuery and Google Cloud Storage. We recommend ensuring that Google BigQuery and Google Cloud Storage are in the same project. For more information on projects, see the Google documentation.

  • Set up an Authorization to Google Cloud Storage in your Adverity Workspace. For more information, see Setting up an Authorization to Google Cloud Storage (Service Account).

  • Set up a Storage in Adverity using the Google Cloud Storage Authorization. For more information, see Setting up Storage for Data Extracts. You do not need to set up this Storage to store the Data Extracts in your Workspace.

Procedure

To transfer data from a Datastream to Google BigQuery, follow these steps:

  1. Add Google BigQuery as a Destination to the Workspace which contains the Datastream or to one of its parent Workspaces.

  2. Assign the Google BigQuery Destination to the Datastream.

  3. Configure transfer settings.

Adding Google BigQuery as a Destination

To add Google BigQuery as a Destination to a Workspace, follow these steps:

  1. Click the Transfer element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Click + Add.

  2. Click Google BigQuery.

  1. Click Setup a new Authorization.

  2. Click Next.

  1. Select one of the following options:

    • To connect to Google BigQuery with your own account, click Google BigQuery (OAuth2), and log in with your user credentials.

    • To connect to Google BigQuery with a service account, click Google BigQuery (Service Account), upload the JSON service account key.

  2. (Optional) When connecting to Google BigQuery with a service account, select Retrieve projects from Google Cloud API to retrieve several projects that the service account can access.

  3. Click Authorize.

  1. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the Destination.

    Cloud Storage

    If you transfer data to Google BigQuery as a batch operation, select the Storage you have created using the Google Cloud Storage Authorization.

    Project

    Specify the name of the project where to transfer the data.

    Dataset

    Specify the name of the dataset where to transfer the data.

    For more information on advanced configuration settings, see Google BigQuery Destination reference.

  1. Click Create.

Assigning Google BigQuery as a Destination

To assign the Google BigQuery Destination to a Datastream, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, click + Add Destination.

  2. Click Assign Existing Destinations.

  1. Select the Google BigQuery checkbox in the list.

  2. Click Save.

Configuring transfer settings

To configure transfer settings, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, find the Google BigQuery Destination in the list, and click on the right.

  2. Click Destination Settings.

  1. Fill in the following fields:

    Table name

    Specify the target table in the Destination where to transfer data from the Datastream. The name can contain alphanumeric characters and underscores. For example, target_table. To specify a schema, use the syntax schemaName.tableName.

    By default, Adverity saves data from each Datastream in a different table named {datastream_type}_{datastream_id} (for example, mailgun_83).

    You can specify the same target table for several Datastreams. If a column is shared between Datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between Datastreams, Adverity writes null values in the relevant cells.

    Use placeholders to create unique, dynamic table names in the Destination. Use the following placeholders:

    Placeholder

    Description

    {app_label}

    The Datastream Type's short name.

    {datastream_id}

    The Datastream ID.

    {datastream_type}

    The Datastream Type.

    {extract_id}

    The Data Extract ID.

    {id}

    The Datastream ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the table name. For example, {meta[datastream_URI]} uses the Datastream URI as the table name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the Data Extract.

    {scheduled_day}

    The day from the start date of a date range for a scheduled data fetch.

    {scheduled_month}

    The month from the start date of a date range for a scheduled data fetch.

    {scheduled_year}

    The year from the start date of a date range for a scheduled data fetch.

    {upload_day}

    The day when the Data Extract is transferred to the Google BigQuery Destination.

    {upload_hour}

    The hour when the Data Extract is transferred to the Google BigQuery Destination.

    {upload_minute}

    The minute when the Data Extract is transferred to the Google BigQuery Destination.

    {upload_month}

    The month when the Data Extract is transferred to the Google BigQuery Destination.

    {upload_second}

    The second when the Data Extract is transferred to the Google BigQuery Destination.

    {upload_year}

    The year when the Data Extract is transferred to the Google BigQuery Destination.

    Action for existing tables

    To specify how Adverity loads data from a Data Extract to the Destination if the target table already exists, select one of the following options:

    • (Default) Select Truncate to remove all data from the relevant table in the Destination before transferring the corresponding Data Extract.

    • Select Append to update the relevant table in the Destination with data from the Data Extract.

    • Select Error (no action) not to load data from the Data Extract to the Destination if the target table already exists. Google BigQuery displays an error message each time Adverity cancels data transfer to the Destination because the target table already exists.

    Partition by date

    (Recommended) If selected, the target table is partitioned by a date column, and data is only replaced based on the date. This means that if you import data into a table with data already present for certain dates, the existing data for these overlapping dates is overwritten, and the data for other, unique dates remains unchanged.

    This option is only effective if you also enable the option Local Data Retention > Extract Filenames > Unique by day. For more information, see Configuring advanced Datastream settings.

  2. Click Save.