Transferring data to AWS S3

This guide explains how to transfer data to AWS S3 to store and further process information.

Concept

AWS S3 is an Active Destination. After you set AWS S3 as the Destination of a Datastream, data is transferred to AWS S3 each time data is fetched for the Datastream. For more information, see Destination types.

You can assign multiple Destinations to a Datastream. For more information on possible limitations, see Assigning multiple Destinations to a Datastream.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Ensure you have login details to the Destination with the following permissions:

    • Read, write, and delete files and folders.

    • List folders.

Procedure

To transfer data from a Datastream to AWS S3, follow these steps:

  1. Add AWS S3 as a Destination to the Workspace which contains the Datastream or to one of its parent Workspaces.

  2. Assign the AWS S3 Destination to the Datastream.

  3. Configure transfer settings.

Adding AWS S3 as a Destination

To add AWS S3 as a Destination to a Workspace, follow these steps:

  1. Click the Transfer element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Click + Add.

  2. Click File.

  1. Click Setup a new Authorization.

  2. Click Next.

  1. Click S3 Authorization.

  2. In Access Key ID, specify the access key ID. For more information, see the Amazon documentation.

  3. In Secret Access Key, specify the secret access key. For more information, see the Amazon documentation.

  4. In AWS Role ARN to assume, enter the AWS role ARN for the S3 bucket. Ensure the role has the correct privileges to access the S3 bucket, and Adverity is configured as a Trusted Relationship for this role. Ensure the AWS role ARN entered in this field is added to your Trusted Policy. An example of an ARN is arn:aws:iam::123456789000:role/datatap-instance. Do not change the datatap-instance part of the ARN.

    For more information on configuring an AWS policy, see Configuring AWS policies.

  5. (Optional) To encrypt the data transferred to the Destination, specify a KMS Encryption Key ID in KMS Encryption Key ID. Ensure your account has the correct policies attached, such as the permissions kms:GenerateDataKey and kms:Decrypt actions in the configured key resource.

  6. Click Authorize.

  1. In the Configuration page, fill in the following fields:

    Name

    (Optional) Rename the Destination.

    Destination URL

    In the drop-down on the left, select the file server type. In the text field in the middle, enter the base URL of the file server. In the text field on the right, enter the path to the folder where to transfer data. Click Test to check the Authorization.

    Output format

    Select the data format that Adverity uses to transfer data to the Destination.

    When you transfer data to AVRO file format, select AVRO to use the null codec, or AVRO (deflate) to use the deflate codec. For more information on codecs, see the Apache documentation.

    For more information on advanced configuration settings, see File Destination reference.

  1. Click Create.

Assigning AWS S3 as a Destination

To assign the AWS S3 Destination to a Datastream, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, click + Add Destination.

  2. Click Assign Existing Destinations.

  1. Select the AWS S3 checkbox in the list.

  2. Click Save.

Configuring transfer settings

To configure transfer settings, follow these steps:

  1. Click the Connect element and select the Workspace you work with in Connect, Enrich & Transfer.

  1. Select the chosen Datastream.

  1. In the Destinations section, find the AWS S3 Destination in the list, and click on the right.

  2. Click Destination Settings.

  1. Fill in the following fields:

    Filename

    Specify the target file in the Destination where to transfer data from the Datastream. The name can contain alphanumeric characters and underscores. For example, target_file.

    To transfer data to sub-folders within the folder defined in the Destination URL field, specify a file path. For example, folder1/target_file.

    By default, Adverity saves data from each Datastream in a different file named {datastream_type}_{datastream_id}_{scheduled_year}_{scheduled_month}_{scheduled_day}.

    You can specify the same target file for several Datastreams. If a column is shared between Datastreams, Adverity performs a full outer join and concatenates values. If a column is not shared between Datastreams, Adverity writes null values in the relevant cells.

    Use placeholders to create unique, dynamic file names in the Destination. Use the following placeholders:

    Placeholder

    Description

    {app_label}

    The Datastream Type's short name.

    {datastream_id}

    The Datastream ID.

    {datastream_type}

    The Datastream Type.

    {extension}

    The file extension of the Data Extract.

    {extract_id}

    The Data Extract ID.

    {id}

    The Datastream ID.

    {meta[*]}

    Replace * with a metadata placeholder to use metadata in the file name. For example, {meta[datastream_URI]} uses the Datastream URI as the file name. For more information on metadata and placeholders, see Using placeholders.

    {name}

    The automatically generated filename of the Data Extract.

    {scheduled_day}

    The day from the start date of a date range for a scheduled data fetch.

    {scheduled_month}

    The month from the start date of a date range for a scheduled data fetch.

    {scheduled_year}

    The year from the start date of a date range for a scheduled data fetch.

    {upload_day}

    The day when the Data Extract is transferred to the AWS S3 Destination.

    {upload_hour}

    The hour when the Data Extract is transferred to the AWS S3 Destination.

    {upload_minute}

    The minute when the Data Extract is transferred to the AWS S3 Destination.

    {upload_month}

    The month when the Data Extract is transferred to the AWS S3 Destination.

    {upload_second}

    The second when the Data Extract is transferred to the AWS S3 Destination.

    {upload_year}

    The year when the Data Extract is transferred to the AWS S3 Destination.

  2. Click Save.