Enabling bulk loading for Google BigQuery in Management API#
This guide explains how to configure bulk loading for a Google BigQuery destination using the Management API. When bulk loading is enabled, Adverity stages data through a Cloud Storage bucket before loading it into BigQuery, which improves loading performance and reliability for large datasets.
Prerequisites#
Before you complete the procedure in this guide, perform all of the following actions:
Create a Google BigQuery destination. For more information, see Creating and deleting destinations.
Create a storage entry in Adverity. For more information, see Managing Storage in Management API. The storage entry does not need to be set as the workspace-level storage.
Ensure that the Cloud Storage bucket and the BigQuery dataset belong to the same Google Cloud project, and that the authorization used by the storage entry has write and delete permissions on the bucket.
Note
These endpoints require the Bearer scheme for the
Authorization header.
Note
If you configure an incompatible Cloud Storage entry, Adverity does not report an error when you save the configuration. The error only appears when a load job runs. Verify the configuration in the Adverity user interface under Settings → Storage before running a load job.
Finding the Google BigQuery destination type ID#
To find the Google BigQuery destination type ID, follow these steps:
Create a GET request to the following endpoint:
https://{{INSTANCE}}/api/target-types/
In the HTTP request header, include the parameter
Authorizationwith the valueBearer {{KEY}}.Send the request.
In the response, find the entry where
"name"is"Google BigQuery"and note itsid. Save this value asTARGET_TYPE_ID.
Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:
curl --location --request GET 'https://{{INSTANCE}}/api/target-types/' \
--header 'Authorization: Bearer {{KEY}}'
Finding your Google BigQuery destination ID#
To find the ID of your Google BigQuery destination, follow these steps:
Create a GET request to the following endpoint:
https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/
Replace
{{TARGET_TYPE_ID}}with the value obtained in the previous section.In the HTTP request header, include the parameter
Authorizationwith the valueBearer {{KEY}}.Send the request.
In the response, find the entry whose
namematches your destination and save itsidasTARGET_ID. If the destination does not appear on the first page, append?page=2to the URL and repeat.
Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:
curl --location --request GET \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/' \
--header 'Authorization: Bearer {{KEY}}'
Finding your Cloud Storage entry ID#
To find the ID of the Cloud Storage entry to use for bulk loading, follow these steps:
Create a GET request to the following endpoint:
https://{{INSTANCE}}/api/storage/
In the HTTP request header, include the parameter
Authorizationwith the valueBearer {{KEY}}.Send the request.
In the response, find the storage entry you want to use and save its
idasSTORAGE_ID.
Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:
curl --location --request GET 'https://{{INSTANCE}}/api/storage/' \
--header 'Authorization: Bearer {{KEY}}'
Attaching the Cloud Storage entry to the destination#
To attach the Cloud Storage entry to the Google BigQuery destination, follow these steps:
Create a PATCH request to the following endpoint:
https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/
Replace
{{TARGET_TYPE_ID}}and{{TARGET_ID}}with the values obtained in the previous sections.In the HTTP request header, include the parameter
Authorizationwith the valueBearer {{KEY}}.In the HTTP request header, include the parameter
Content-Typewith valueapplication/json.In the HTTP request body, include the
datalakeparameter set to theidof your Cloud Storage entry:{ "datalake": {{STORAGE_ID}} }
Send the request.
A successful request returns HTTP 200. The next load triggered by this destination stages data through the configured Cloud Storage bucket.
Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:
curl --location --request PATCH \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/' \
--header 'Authorization: Bearer {{KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{"datalake": {{STORAGE_ID}}}'
Detaching the Cloud Storage entry from the destination#
To remove the Cloud Storage entry from the Google BigQuery destination and revert to the default loading behaviour, follow these steps:
Create a PATCH request to the following endpoint:
https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/
In the HTTP request header, include the parameter
Authorizationwith the valueBearer {{KEY}}.In the HTTP request header, include the parameter
Content-Typewith valueapplication/json.In the HTTP request body, include the following parameter:
{ "datalake": null }
Send the request.
Import the request example as raw text to your HTTP client (such as Postman). The cURL request example is the following:
curl --location --request PATCH \
'https://{{INSTANCE}}/api/target-types/{{TARGET_TYPE_ID}}/targets/{{TARGET_ID}}/' \
--header 'Authorization: Bearer {{KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{"datalake": null}'
Troubleshooting#
Symptom |
Likely cause and action |
|---|---|
403 Forbidden on the PATCH request |
The API key is missing the Destinations: Write scope. Re-create the key with both Destinations: Read and Destinations: Write scopes. |
403 Forbidden on the GET requests |
The API key is missing the Destinations: Read scope, or the
|
404 Not Found |
Check that the host URL does not include a trailing slash. Also
verify that |
PATCH returns 200 but the next load fails |
Check the following:
|