Combining data extracts using Bundle#

This guide explains how to use Bundle to combine multiple data extracts into a single data extract file.

Scope#

After completing this guide, you will achieve the following:

  • Create a Bundle datastream.

  • Combine data from multiple datastreams into a single data extract.

  • Preview the data combined using Bundle.

Concept#

Bundle is a connector that combines data extracts from multiple datastreams into a single data extract.

When setting up a Bundle connector, you select the datastreams that contain the data extracts to combine into a single file.

You can also configure how Bundle finds and matches these data extracts. There are three ways by which Bundle matches the data extract files to be combined, these are as follows:

  • Pattern - This option looks at the names of the data extract files and finds similar patterns in the text (for example, the date used in the extract’s file name).

  • Created Date - This option matches data extract files by looking at the date on which they were created.

  • Scheduled Date - This option matches data extract files based on the date range of the fetched data.

Prerequisites#

Before you complete the procedure in this guide, perform all of the following actions:

Procedure#

To combine data using Bundle, follow these steps:

  1. Go to the Datastreams page.

  2. In the top right corner, click + Create datastream.

  3. Search for and click Connector Name.

  4. (Optional) Rename your datastream.

  5. In Workspace, select the workspace that contains the datastreams with data extracts to combine.

  6. In Datastreams, select the datastreams with data extracts to combine. If the applicable settings are in place, you can select datastreams belonging to parent workspaces. For more information on configuring these applicable settings, see Including datastreams from parent workspaces in Bundle.

  7. In Match files by, ensure the default option Pattern is selected. For more information on the three Match files by options, see Configuring which data extracts to combine.

  8. In Extract Pattern, ensure the default regular expression ^.*-%Y%m%d.*\.csv$ is present in the field.

  9. (Optional) Select the Apply Data Mapping checkbox to apply Data Mapping (assigned to the selected datastreams) when combing multiple data extracts into a single data extract file. For more information on running Bundle with this option selected, see Configuring Bundle’s Data Mapping options.

  10. (Optional) Select one of the following:

    • To combine all data extracts within a fetched date range into a single data extract file, select the Concatenate checkbox.

    • To create separate extract files for each day of the fetched date range, clear the Concatenate checkbox.

  11. Click Next.

  12. To assign destinations to your datastream, select their checkboxes. For more information on destinations and their configuration settings, see Introduction to loading data into Adverity Data Storage and external destinations.

  13. Click Next.

  14. Enter a date range that includes the collection date ranges of the data extract files you wish to combine. For example:

    • Datastream 1 has a data extract file containing data in the date range of 2021-03-01 to 2021-04-05.

    • Datastream 2 has a data extract file containing data in the date range of 2021-03-02 to 2021-03-06.

    To combine the two separate extract files, enter the date range 2021-03-01 - 2021-03-06. This is the date range that includes all dates for both data extracts.

  15. Click Run fetch.

The fetch combines data from selected datastream data extracts. The Overview page of the newly created datastream is now displayed. To preview the combined data, follow these steps:

  1. In the All tasks tab, find the task at the top of the list, and click image1 Show extracts.

  2. Click the top hyperlink.

  3. The data extract is displayed in a table containing the data that you have fetched.

Advanced Bundle tips#

Accessing Bundle configuration#

To configure the Bundle connector, follow these steps:

  1. Go to the Datastreams page.

  2. Open the chosen datastream by clicking on its name.

  3. In the top navigation panel, click Settings.

Configuring which data extracts to combine#

Bundle matches the data extract files to be combined in one of the following ways:

  • Pattern - This option looks at the names of the data extract files to find similarly named data extracts.

  • Created Date - This option matches data extract files by looking at the date on which they were created.

  • Scheduled Date - This option matches data extract files based on the date range of the fetched data.

Pattern#

The Pattern option matches the data extract files by looking at the names of the extract files and combining those with similar file names. This is achieved using regular expressions. When selecting this option, you must provide a regular expression in the Extract pattern field (a default regular expression is provided).

Keep the default settings to ensure Bundle combines the data extract files successfully. These default settings are as follows:

  • For each of the datastreams with data extracts to combine, enable the option Local Data Retention > Extract Filenames > Unique by day. For more information, see Configuring advanced datastream settings.

  • In the Extract Pattern field, keep the following regular expression: ^.*-%Y%m%d.*\.csv$.

These default options allow Bundle to match data extract files with the same date in their file names and combine them into a single data extract.

To match the data extract file names using another approach, enter a regular expression in the Extract Pattern field.

Created Date#

This option matches the data extracts by the date of data extract creation. The created date is the date on which the data was collected. For example, data extract with a date range of 2021-03-01 to 2021-03-10 fetched on 2021-04-01, has the 2021-04-01 as created date of the data extract.

Scheduled Date#

This option matches the data extract by the date range of the fetched data. The scheduled date is the start date of a data extract’s date range. For example, data collected within the date range of 2021-03-01 to 2021-03-10 with a data fetch that took place on 2021-04-01, then the scheduled date of the data extract would be 2021-03-01.

The data extract scheduled date is found in the data extract metadata. For more information on viewing data extract metadata, see Using placeholders.

Including datastreams from parent workspaces in Bundle#

When setting up Bundle connector, you select the workspace which contains the datastreams with data you wish to combine into a single data extract. If this workspace is a child workspace you can select datastreams from the parent workspace.

To include datastreams that belong to parent workspaces in the Bundle connector, follow these steps:

  1. In the top left corner, click image2 Select Workspace.

  2. From the list, select the parent workspace that contains the datastreams to include in Bundle.

  3. In the secondary menu, click Datastreams.

  4. Click the datastream to include in Bundle.

  5. In the top navigation panel, click Settings.

  6. Select the Share with children checkbox.

As a result, you can now select the datastream that belongs to the parent workspace when configuring the Bundle connector in any of its child workspaces.

Scheduling Bundle fetches#

Bundle datastream fetches can be scheduled in two ways:

  • Smart schedule

    With smart schedule, data is fetched when all source datastreams are up to date.

    Smart schedule fetches are triggered only by scheduled fetches in the source datastreams, which helps you to further automate data collection.

    Smart schedule can be enabled only when all source datastreams run on the same schedule and for the same time range. The Bundle fetch will start when all source fetches for the same time range are completed.

    For example, if 2 source datastreams are scheduled to perform fetches every day for the previous day, the Bundle datastream with smart schedule will fetch data every day after both source fetches are completed.

    To check the schedules of the source datastreams, click Source datastream image3 in the schedule configuration window.

    image4

    Example of the smart schedule configuration window.

  • Standard schedule

    With the standard schedule, data is fetched based on your schedule.

Configuring Bundle’s Data Mapping options#

Bundle has the ability to apply Data Mapping when combining multiple data extracts into a single file. When configuring the Bundle connector, select the Apply Data Mapping checkbox to apply Data Mapping.

Set up Data Mapping on the individual datastreams before running Bundle to combine the data extracts.

If Data Mapping is to be applied during the data extract file combination, the following will occur:

  • Mapped columns from different datastreams are combined into a single column. For example, the mapped columns of campaignName, campaign and ga:campaign from three separate datastreams are combined into the single column of campaign_name.

  • Two new columns are added to the Bundle data extract file, these are as follows:

    • dt_datastream_name - This is the name given to the datastream. For example, Facebook Ads Insights - Q4 2020.

    • dt_datasource - This is the name of the data source. For example, Facebook Ads Insights.

The Apply Data Mapping feature is optional. If the Apply Data Mapping checkbox is not selected then all the columns in the multiple data extracts are displayed with their original names in the combined data extract.

Data extract creation and scheduled dates#

A data extract creation date is not the same as the scheduled date. For example, data extract with a date range of 2021-03-01 to 2021-03-10 fetched on 2021-04-01.

  • The data extract creation date is 2021-04-01.

  • The data extract scheduled date is 2021-03-01.