join

Join data extracts from multiple datastreams.

This guide explains how to configure the join instruction. To learn about another instruction, go back to the Available custom script instructions overview.

Introduction

Use the join instruction to combine the data extracts from multiple datastreams into a single extract using a shared identifying column containing matching values.

The Join Datastream standard transformation enables you to join datastreams in a few easy steps, instead of using a custom script. For more information, see Join Datastream.

What data extracts can I combine?

You can combine any data extracts. This includes data extracts collected from the same data source or from different data sources.

What do I need in my data extracts?

The data extracts that you want to combine must have at least one common column. For example, the datastreams could all contain a column called Customer ID. The join transformation will apply to rows with matching values in the Customer ID column.

I have multiple data extracts belonging to my chosen datastream. Which data extracts will be combined?

The join instruction joins the latest data extracts from each datastream. However, if this transformation is performed on the same day as a scheduled fetch, the data extract from the scheduled fetch will be taken as the latest data extract even if there is a more recent data extract from a manual fetch.

Creating a custom script transformation using the join instruction

To create and configure a custom script using the join instruction, follow these steps:

  1. Create a custom script transformation.

  2. In the Configuration step, select the join instruction.

  3. To configure the custom script instruction, fill in the required fields below, as well as any optional fields you require:

How (required)

Select how the datastreams are to be joined. Choose from one of the following options:

  • Left - Select this option to join two datastreams and keep all the data in the data extract of the datastream to which the join transformation is applied (the left datastream). Any rows with non-matching key columns in the data extract of the joining datastream (the right datastream) are removed.

  • Right - This is the opposite of Left. Select this option to join two datastreams and keep all the data in the data extract of the joining datastream (the right datastream). Any rows with non-matching key columns are removed from the data extract of the datastream to which the join transformation is applied (the left datastream).

  • Inner - Select this option to keep only the columns with matching rows from both data extracts. These are the columns defined in the Key field.

  • Outer - Select this option to keep all the rows from both of the data extracts to join.

  • Interval - Select this option to join only a selection of the data extracts of the joining datastreams. Enter the range for both the left and right datastreams. Any matching rows that are present in both of these ranges are joined using this transformation. Enter the start and end of the range in the Left/Right Start and Left/Right Stop fields.

  • Leftinterval - Select this option to perform the same join as the Interval join described above. However, this option will keep all the non-matching rows in the data extract of the left datastream.

Keys (or Left key and Right key) (required)

Specify the columns found in both data extracts that you want to use to match the data. To add multiple keys to match, click . Choose one of the following options:

  • Select String and enter the name of the columns. This option only works if both columns have the same name in their corresponding data extracts.

  • Select Integer and enter the position of the column in the data extract as an index value. Counting starts at 0. To select the first column, enter 0. To select the second column, enter 1. To select the last column in the data extract, enter -1. This option only works if both columns are in the same position in their respective data extracts.

  • If the columns have different names in their respective data extracts, use the Left key and Right key fields:

    • In Left key, enter the name of the column in the datastream to which the transformation is applied.

    • In Right key, enter the name of the column in the joining datastream.

Streams (required)

Enter the name of the datastream that you want to join to the datastream. To join multiple datastreams, click and enter the names of the datastreams to join.

Columns

Enter the name of the columns in the right datastream (this is the joining datastream) to be joined. To join every column from the right datastream, leave this field blank and select the Join all field.

Distinct join

Select this field to prevent multiple identical rows in the key columns from being joined. When selected, if multiple rows in the right datastream (this is the joining datastream) match a single row in the left datastream, then only the first row from the right datastream is joined.

Join all

Select this field to join all the completed data extracts from the selected datastreams.

Concatenate

If you are joining multiple datastreams, select this field to remove columns that have the same name.

Right select

Enter a python expression to find and pre-select rows in the right datastream before the join starts.

Right prefix

Enter a value to prefix on to the column names that are to be joined. For example, if you enter the value joined- then all the names of the columns joined into the left datastream (such as clicks) would be renamed (such as to joined-clicks).

Right presorted

Select this field if the data extract of the right datastream (this is the joining datastream) has been sorted. This will speed up the transformation.

Include tags

Enter the names of the tags used to identify which data extracts are to be joined. If you populate this field, only the data extracts with the selected tags are merged. Leave this field blank to join only the most recent data extracts from the selected datastreams.

To add tags to a data extract, use the set_tags transformation.

Exclude tags

Enter the names of the tags used to identify which data extracts should not be joined. If you enter tags in this field, they will have priority over the tags entered in the Include Tags field.

Convert keys

Select this field to convert keys to unicode.

Ignore join date

Leave this field unselected to join only the data extracts with a scheduled_date that is earlier than the current_extract_date. The Include Tags and Exclude Tags fields have priority over the Ignore Join Date field.

Left start

Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the transformation is applied) used to identify the start of the interval.

Left stop

Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the transformation is applied) used to identify the end of the interval.

Right start

Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the start of the interval.

Right stop

Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the end of the interval.

Subtable

Enter the name for a subtable that you want to use within this custom script.

A subtable is a temporary table that only exists for this custom script. You can apply additional instructions within the same custom script to the subtable. However, the subtable cannot be used in any other custom scripts.

If a subtable does not exist for the current custom script, the transformation is applied to the data extract, and the enriched data is output into the subtable. If the subtable already exists for the custom script, the subtable is used as the input for the transformation and optionally as the output.

Example configuration

In this example, we have data extracts from a Facebook Ads datastream and a Google Ads datastream. Both data extracts contain the same identifying value in a column called Product ID, as well as different product name and clicks values in columns with different names.

We want to use the product ID values to join these data extracts, and add the clicks values from our Google Ads data extract to our Facebook Ads data extract. We do not want to include the product names from our Google Ads data extract in the joined data extract.

To join the latest data extracts from these datastreams under these conditions, the join instruction would be configured as follows:

  1. Go to the Transformations page.

  2. Click + Create transformation.

  3. In the Type step, select Custom script.

  4. In the Assignments step, select the first datastream.

    In this example, this is our Facebook Ads datastream.

  5. In the Instructions step, select the join instruction.

  6. In How, select how you would like to join the two data extracts.

    In this example, we will use a left join, because we want to add the data from the right datastream (Google Ads) to the left datastream (Facebook Ads).

  7. In Keys, click + and enter the name of the column found in both data extracts that contains the matching values that Adverity will use to join the data extracts.

    In this example, this column is called Product ID in both data extracts. If the matching values were in columns with different names in each data extract, we would use the Left key and Right key fields, as described above.

  8. In the Streams field, click + and select the second datastream.

    In this example, this is our Google Ads datastream.

  9. In the Columns field, enter the name of the additional columns to join.

    In this example, we will enter Clicks because we only want to include this column from our Google Ads data extract in the joined data extract.

    Alternatively, if we wanted to include all additional columns in the joined data extract, we could leave this field empty and instead select the Join all checkbox.

  10. Leave the other fields on their default selections.

As a result, we have created a transformation using the join custom script that will join the data from our Facebook Ads and Google Ads datastreams.

Data extract before transformation: Left datastream

Before running the transformation, the Facebook Ads data extract contains the following data.

Product ID

Name

Total clicks

A100

Apple

5

B100

Banana

12

C100

Cherry

30

Data extract before transformation: Right datastream

Before running the transformation, the Google Ads data extract contains the following data.

Product ID

Product name

Clicks

A100

Fruit_Apple

50

C100

Fruit_Cherry

60

C200

Vegetable_Carrot

90

Data extract after transformation: Left datastream containing joined data

After running the transformation, the Facebook Ads data extract contains the following data.

Product ID

Name

Total clicks

Clicks

A100

Apple

5

50

B100

Banana

12

C100

Cherry

30

60

For all rows in the Facebook Ads data extract, a value in the Clicks column has been added if there was a row in the Google Ads data extract with a matching value in the Product ID column.

Any rows in the Google Ads data extract with a value in the Product ID column that does not match a value in the Product ID column of the Facebook Ads data extract are not joined.