Loading data into AWS S3

This guide explains how to load data into AWS S3 for further analysis.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Create a datastream whose data you want to load into AWS S3. For more information on creating a datastream, see Creating a datastream.

  • Ensure you have login details to the destination with the following permissions:

    • Read, write, and delete files and folders.

    • List folders.

Procedure

To load data from a datastream into AWS S3, follow these steps:

  1. Add AWS S3 as a destination to the workspace which contains the datastream or to one of its parent workspaces.

  2. Assign the AWS S3 destination to the datastream.

    You can assign as many destinations to a datastream as you want.

    Some destinations require specific Data Mapping, such as Hubspot and Facebook Offline Conversions. If these Data Mapping requirements conflict, the destinations cannot be assigned to the same datastream.

  3. Configure load settings.

Adding AWS S3 as a destination

To add AWS S3 as a destination to a workspace, follow these steps:

  1. Go to the Destinations page.

  2. Click + Create destination.

  3. Search for and click File.

  1. Choose how to authorize Adverity to access AWS S3:

    • To use your details, click Access AWS S3 using your credentials.

    • To ask someone else to use their details, click Access AWS S3 using someone else's credentials.

      If you choose this option, the person you ask to create the authorization will need to go through the following steps.

  2. Click Next.

  1. Click S3 Authorization.

    There are two ways to authorize AWS S3 and they are mutually exclusive. You must either use an Access Key ID and Secret Access Key, or an AWS Role ARN. The authorization method you do not use must be left empty.

  2. In Access Key ID, specify the access key ID. For more information, see the Amazon documentation.

  3. In Secret Access Key, specify the secret access key. For more information, see the Amazon documentation.

  4. In AWS Role ARN to assume, enter the AWS role ARN for the S3 bucket, as shown below. Ensure that the role has the correct privileges to access the S3 bucket, and Adverity is configured as a Trusted Relationship for this role. Ensure that the AWS role ARN entered in this field is added to your Trusted Policy.

    The Adverity ARN is arn:aws:iam::375956725820:role/datatap-instance. Do not change any part of the ARN.

    For more information on configuring an AWS policy, see Advanced AWS policy configuration.

  5. (Optional) To encrypt the data loaded into the destination, specify a KMS Encryption Key ID in KMS Encryption Key ID. Ensure your account has the correct policies attached, such as the permissions kms:GenerateDataKey and kms:Decrypt actions in the configured key resource.

  6. Click Authorize.

  1. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the destination.

    Destination URL

    In the drop-down on the left, select the file server type. In the text field in the middle, enter the base URL of the file server. In the text field on the right, enter the path to the folder into which you want to load data. Click Test to check the authorization.

    Output format

    Select the data format that Adverity uses to load data into the destination.

    When you load data in AVRO file format, select AVRO to use the null codec, or AVRO (deflate) to use the deflate codec. For more information on codecs, see the Apache documentation.

    For more information on advanced configuration settings, see Advanced File destination configuration.

  1. Click Create.

Assigning AWS S3 as a destination

To assign the AWS S3 destination to a datastream, follow these steps:

  1. Go to the Datastreams page.

  2. Open the chosen datastream by clicking on its name.

  1. In the Load section, click + Add destination.

  1. Select the AWS S3 checkbox in the list.

  2. Click Save.

  3. For the automatically enabled destinations, in the pop-up window, click Yes, load data if you want to automatically load your previously collected data into the new destination. The following data extracts will be loaded:

    • All data extracts with the status collected if no other destinations are enabled for the datastream

    • All data extracts with the status loaded if the data extracts have already been sent to Adverity Data Storage or external destinations

    Alternatively, click Skip to continue configuring the destination settings or re-load the data extracts manually. For more information, see Re-loading a data extract.

Configuring settings for loading data into AWS S3

To configure the settings for loading data into AWS S3, follow these steps:

  1. Go to the Datastreams page.

  2. Open the chosen datastream by clicking on its name.

  1. In the Load section, find the AWS S3 destination in the list, and click Actions on the right.

  2. Click Destination settings.

  1. Fill in the following fields:

    Filename

    Specify the target file in the destination into which to load data from the datastream. The name can contain alphanumeric characters and underscores. For example, target_file.

    To load data into sub-folders within the folder defined in the Destination URL field, specify a file path. For example, folder1/target_file.

    By default, Adverity saves data from each datastream in a different file named {datastream_type}_{datastream_id}_{scheduled_year}_{scheduled_month}_{scheduled_day}.

    If you specify the same target file for more than one datastream, the existing file will be overwritten with the new data.

    • To create a new AWS S3 spreadsheet containing the data you load into AWS S3, enter a name for the new spreadsheet into this field.

    You can use the following placeholders when creating new file names in the destination:

    Placeholder

    Description

    {app_label}

    The data source's short name.

    {datastream_id}

    The datastream ID.

    {datastream_type}

    The data source.

    {extension}

    The file extension of the data extract.

    {extract_id}

    The data extract ID.

    {id}

    The datastream ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the file name. For example, {meta[datastream_URI]} uses the datastream URI as the file name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the data extract.

    {scheduled_day}

    The day when the data fetch was scheduled to run.

    {scheduled_month}

    The month when the data fetch was scheduled to run.

    {scheduled_year}

    The year when the data fetch was scheduled to run.

    {upload_day}

    The day when the data extract is loaded into the AWS S3 destination.

    {upload_hour}

    The hour when the data extract is loaded into the AWS S3 destination.

    {upload_minute}

    The minute when the data extract is loaded into the AWS S3 destination.

    {upload_month}

    The month when the data extract is loaded into the AWS S3 destination.

    {upload_second}

    The second when the data extract is loaded into the AWS S3 destination.

    {upload_year}

    The year when the data extract is loaded into the AWS S3 destination.

  2. Click Save.