Enabling bulk loading for Google BigQuery in Management API#

This guide explains how to configure bulk loading for a Google BigQuery destination using the Management API. When bulk loading is enabled, Adverity stages data through a Cloud Storage bucket before loading it into BigQuery, which improves loading performance and reliability for large datasets.

Prerequisites#

Before you complete the procedure in this guide, perform all of the following actions:

  • Create a Google BigQuery destination. For more information, see Creating and deleting destinations.

  • Create a storage entry in Adverity. For more information, see Managing Storage in Management API. The storage entry does not need to be set as the workspace-level storage.

  • Ensure that the Cloud Storage bucket and the BigQuery dataset belong to the same Google Cloud project, and that the authorization used by the storage entry has write and delete permissions on the bucket.

Note

These endpoints require the Bearer scheme for the Authorization header.

Note

If you configure an incompatible Cloud Storage entry, Adverity does not report an error when you save the configuration. The error only appears when a load job runs. Verify the configuration in the Adverity user interface under Settings → Storage before running a load job.

Finding the Google BigQuery destination type ID#

To find the Google BigQuery destination type ID, follow these steps:

  1. Create a GET request to the following endpoint:

    https://{{INSTANCE}}/api/target-types/
    
  2. In the HTTP request header, include the parameter Authorization with the value Bearer {{KEY}}.

  3. Send the request.

  4. In the response, find the entry where "name" is "Google BigQuery" and note its id. Save this value as TARGET_TYPE_ID.

Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:

curl --location --request GET 'https://{{INSTANCE}}/api/target-types/' \
--header 'Authorization: Bearer {{KEY}}'

Finding your Google BigQuery destination ID#

To find the ID of your Google BigQuery destination, follow these steps:

  1. Create a GET request to the following endpoint:

    https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/
    

    Replace {{TARGET_TYPE_ID}} with the value obtained in the previous section.

  2. In the HTTP request header, include the parameter Authorization with the value Bearer {{KEY}}.

  3. Send the request.

  4. In the response, find the entry whose name matches your destination and save its id as TARGET_ID. If the destination does not appear on the first page, append ?page=2 to the URL and repeat.

Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:

curl --location --request GET \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/' \
--header 'Authorization: Bearer {{KEY}}'

Finding your Cloud Storage entry ID#

To find the ID of the Cloud Storage entry to use for bulk loading, follow these steps:

  1. Create a GET request to the following endpoint:

    https://{{INSTANCE}}/api/storage/
    
  2. In the HTTP request header, include the parameter Authorization with the value Bearer {{KEY}}.

  3. Send the request.

  4. In the response, find the storage entry you want to use and save its id as STORAGE_ID.

Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:

curl --location --request GET 'https://{{INSTANCE}}/api/storage/' \
--header 'Authorization: Bearer {{KEY}}'

Attaching the Cloud Storage entry to the destination#

To attach the Cloud Storage entry to the Google BigQuery destination, follow these steps:

  1. Create a PATCH request to the following endpoint:

    https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/
    

    Replace {{TARGET_TYPE_ID}} and {{TARGET_ID}} with the values obtained in the previous sections.

  2. In the HTTP request header, include the parameter Authorization with the value Bearer {{KEY}}.

  3. In the HTTP request header, include the parameter Content-Type with value application/json.

  4. In the HTTP request body, include the datalake parameter set to the id of your Cloud Storage entry:

    {
       "datalake": {{STORAGE_ID}}
    }
    
  5. Send the request.

A successful request returns HTTP 200. The next load triggered by this destination stages data through the configured Cloud Storage bucket.

Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:

curl --location --request PATCH \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/' \
--header 'Authorization: Bearer {{KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{"datalake": {{STORAGE_ID}}}'

Detaching the Cloud Storage entry from the destination#

To remove the Cloud Storage entry from the Google BigQuery destination and revert to the default loading behaviour, follow these steps:

  1. Create a PATCH request to the following endpoint:

    https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/
    
  2. In the HTTP request header, include the parameter Authorization with the value Bearer {{KEY}}.

  3. In the HTTP request header, include the parameter Content-Type with value application/json.

  4. In the HTTP request body, include the following parameter:

    {
       "datalake": null
    }
    
  5. Send the request.

Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:

curl --location --request PATCH \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/' \
--header 'Authorization: Bearer {{KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{"datalake": null}'

Troubleshooting#

Symptom

Likely cause and action

403 Forbidden on the PATCH request

The API key is missing the Destinations: Write scope. Re-create the key with both Destinations: Read and Destinations: Write scopes.

403 Forbidden on the GET requests

The API key is missing the Destinations: Read scope, or the Authorization header uses Token instead of Bearer.

404 Not Found

Check that the host URL does not include a trailing slash. Also verify that TARGET_TYPE_ID and TARGET_ID are correct.

PATCH returns 200 but the next load fails

Check the following:

  • In Adverity, verify that the storage entry under Settings → Storage uses the correct authorization.

  • In Google Cloud, verify that the Cloud Storage bucket and BigQuery dataset are in the same project, and that the service account has write and delete permissions on the bucket.