Loading data into Google BigQuery#

This guide explains how to load data into Google BigQuery for further analysis.

Introduction#

This video guide explains how to create an authorization and configure your settings to load your data into Google BigQuery.

Prerequisites#

Before you complete the procedure in this guide, perform all of the following actions:

Note

Make sure that the target field names follow the destination requirements. For more information, see the Google documentation.

  • Create a dataset in Google BigQuery that is dedicated to data loaded in from Adverity.

  • Ensure that the Google account you use to connect to Google BigQuery includes the following permissions:

    • bigquery.jobs.get

    • bigquery.jobs.list

    • bigquery.jobs.create

    • bigquery.tables.get

    • bigquery.tables.getData

    • bigquery.tables.list

    • bigquery.datasets.get

    • bigquery.tables.create

    • bigquery.tables.update

    • bigquery.tables.updateData

    Alternatively, if your account does not include all of the permissions above, connect to Google BigQuery with a JSON service account key. For more information on creating a JSON service account key, see the Google documentation.

If you load large data sets into Google BigQuery (for example, larger than 50 GB), load the data as a batch operation. For more information on batch loading data to Google BigQuery, see the Google documentation. To use the batch load function, perform all of the following actions in addition to the prerequisites listed above:

  • Ensure that the Google account you use to connect to Google BigQuery includes the following permissions:

    • storage.objects.get

    • storage.objects.list

  • Ensure that the Google account you use to connect to Google BigQuery has permissions to write and delete files in your Google Cloud Storage.

  • Ensure that the account you use for Google BigQuery and Google Cloud Storage has access to both Google BigQuery and Google Cloud Storage. We recommend ensuring that Google BigQuery and Google Cloud Storage are in the same project. For more information on projects, see the Google documentation.

  • Set up an authorization to Google Cloud Storage in your Adverity workspace. For more information, see Setting up an |authorization| to Google Cloud Storage (Service Account).

  • Set up a storage in Adverity using the Google Cloud Storage authorization. For more information, see Setting up Storage for data extracts. You do not need to set up this storage to store the data extracts in your workspace.

Procedure#

To load data from a datastream into Google BigQuery, follow these steps:

  1. Add Google BigQuery as a destination to the workspace which contains the datastream or to one of its parent workspaces.

  2. Assign the Google BigQuery destination to the datastream.

    You can assign as many destinations to a datastream as you want.

    Some destinations require specific Data Mapping, such as Hubspot and Facebook Offline Conversions. If these Data Mapping requirements conflict, the destinations cannot be assigned to the same datastream.

  3. Configure load settings.

Adding Google BigQuery as a destination#

To add Google BigQuery as a destination to a workspace, follow these steps:

  1. Go to the Destinations page.

  2. Click + Create destination.

  3. Search for and click Google BigQuery.

  4. Choose how to authorize Adverity to access Google BigQuery:

    • To use your details, click Access Google BigQuery using your credentials.

    • To ask someone else to use their details, click Access Google BigQuery using someone else’s credentials.

      If you choose this option, the person you ask to create the authorization will need to go through the following steps.

  5. Click Next.

  6. Select one of the following options:

    • To connect to Google BigQuery with your own account, click Google BigQuery (OAuth2), and log in with your user credentials.

    • To connect to Google BigQuery with a service account, click Google BigQuery (Service Account), upload the JSON service account key.

  7. (Optional) When connecting to Google BigQuery with a service account, select Retrieve projects from Google Cloud API to retrieve several projects that the service account can access.

  8. Click Authorize.

  9. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the destination.

    Cloud Storage

    If you load data into Google BigQuery as a batch operation, select the storage you have created using the Google Cloud Storage authorization.

    Project

    Specify the name of the project into which you want to load the data.

    Dataset

    Specify the name of the dataset into which you want to load the data.

    For more information on advanced configuration settings, see Advanced Google BigQuery tips.

  10. Click Create.

Assigning Google BigQuery as a destination#

To assign the Google BigQuery destination to a datastream, follow these steps:

  1. Go to the Datastreams page.

  2. Open the chosen datastream by clicking on its name.

  3. In the Load section, click + Add destination.

  4. Select the Google BigQuery checkbox in the list.

  5. Click Save.

  6. For the automatically enabled destinations, in the pop-up window, click Yes, load data if you want to automatically load your previously collected data into the new destination. The following data extracts will be loaded:

    • All data extracts with the status collected if no other destinations are enabled for the datastream

    • All data extracts with the status loaded if the data extracts have already been sent to Adverity Data Storage or external destinations

    Alternatively, click Skip to continue configuring the destination settings or re-load the data extracts manually. For more information, see Re-loading a data extract.

Configuring settings for loading data into Google BigQuery#

To configure the settings for loading data into Google BigQuery, follow these steps:

  1. Go to the Datastreams page.

  2. Open the chosen datastream by clicking on its name.

  3. In the Load section, find the Google BigQuery destination in the list, and click image1 Actions on the right.

  4. Click image2 Destination settings.

  5. Fill in the following fields:

    Table name

    Specify the target table in the destination into which to load data from the datastream. The name can contain alphanumeric characters and underscores. For example, target_table.

    By default, Adverity saves data from each datastream in a different table named {datastream_type}_{datastream_id} (for example, mailgun_83).

    You can specify the same target table for several datastreams. If a column is shared between datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between datastreams, Adverity writes null values in the relevant cells.

    • To create a new Google BigQuery spreadsheet containing the data you load into Google BigQuery, enter a name for the new spreadsheet into this field.

    You can use the following placeholders when creating new table names in the destination:

    Placeholder

    Description

    {app_label}

    The data source’s short name.

    {datastream_id}

    The datastream ID.

    {datastream_type}

    The data source.

    {extension}

    The file extension of the data extract.

    {extract_id}

    The data extract ID.

    {id}

    The datastream ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the table name. For example, {meta[datastream_URI]} uses the datastream URI as the table name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the data extract.

    {scheduled_day}

    The day when the data fetch was scheduled to run.

    {scheduled_month}

    The month when the data fetch was scheduled to run.

    {scheduled_year}

    The year when the data fetch was scheduled to run.

    {upload_day}

    The day when the data extract is loaded into the Google BigQuery destination.

    {upload_hour}

    The hour when the data extract is loaded into the Google BigQuery destination.

    {upload_minute}

    The minute when the data extract is loaded into the Google BigQuery destination.

    {upload_month}

    The month when the data extract is loaded into the Google BigQuery destination.

    {upload_second}

    The second when the data extract is loaded into the Google BigQuery destination.

    {upload_year}

    The year when the data extract is loaded into the Google BigQuery destination.

    Action for existing tables

    To specify how Adverity loads data from a data extract to the destination if the target tablespreadsheet already exists, select one of the following options:

    • (Default) Select OverwriteTruncate to remove all data from the relevant tablespreadsheet in the destination before loading the corresponding data extract into your destination.

    • Select AppendAdd to add the data from the data extract to the relevant tablespreadsheet in the destination without removing any existing data.

    • Select Error (no action) not to load data from the data extract to the destination if the target table already exists. Google BigQuery displays an error message each time Adverity cancels the data load into the destination because the target table already exists.

    Partition by date

    (Recommended) If selected, the target table is partitioned by a date column, and data is only replaced based on the date. This means that if you import data into a table with data already present for certain dates, the existing data for these overlapping dates is overwritten, and the data for other, unique dates remains unchanged.

    This option is only effective if you also enable the option Local Data Retention > Extract Filenames > Unique by day. For more information, see Configuring advanced datastream settings.

    Partitioning is set the first time you load data into the destination and cannot be changed through Adverity later.

    When this option is selected, select the Partition Date Column from the drop-down list.

    Table expiration

    Specify the number of days for which to keep data in the destination. To keep the table indefinitely, leave this field empty. If you want to remove the expiration period, change the value to 0. Changes to the expiration period will take place the next time you load data into Google BigQuery.

  6. Click Save.

Advanced Google BigQuery tips#

Configuration#

To configure the Google BigQuery destination, follow these steps:

  1. Go to the Destinations page.

  2. Click the Google BigQuery destination in the list.

  3. In the secondary menu, click Data Mapping.

  4. Fill in the following fields:

    Data Mapping

    By default, Adverity applies the Data Mapping conventions to the data extracts when loading them into the destination. For more information on Data Mapping, see Harmonizing data.

    To load data into the destination without applying the Data Mapping conventions, clear this checkbox.

    Force string type

    By default, Adverity automatically recognizes the data type of each field. Select this checkbox to convert all fields to the String data type.

  5. Click Save.