join
Join data extracts from multiple datastreams.
This guide explains how to configure the join instruction. To learn about another instruction, go back to the Available custom script instructions overview.
Introduction
Use the join instruction to combine the data extracts from multiple datastreams into a single extract using a shared identifying column containing matching values.
The Join Datastream standard transformation enables you to join datastreams in a few easy steps, instead of using a custom script. For more information, see Join Datastream.
-
What data extracts can I combine?
-
You can combine any data extracts. This includes data extracts collected from the same data source or from different data sources.
-
What do I need in my data extracts?
-
The data extracts that you want to combine must have at least one common column. For example, the datastreams could all contain a column called Customer ID. The join transformation will apply to rows with matching values in the Customer ID column.
-
I have multiple data extracts belonging to my chosen datastream. Which data extracts will be combined?
-
The join instruction joins the latest data extracts from each datastream. However, if this transformation is performed on the same day as a scheduled fetch, the data extract from the scheduled fetch will be taken as the latest data extract even if there is a more recent data extract from a manual fetch.
Creating a custom script transformation using the join instruction
To create and configure a custom script using the join instruction, follow these steps:
-
In the Configuration step, select the join instruction.
-
To configure the custom script instruction, fill in the required fields below, as well as any optional fields you require:
-
How (required)
-
Select how the datastreams are to be joined. Choose from one of the following options:
-
Left - Select this option to join two datastreams and keep all the data in the data extract of the datastream to which the join transformation is applied (the left datastream). Any rows with non-matching key columns in the data extract of the joining datastream (the right datastream) are removed.
-
Right - This is the opposite of Left. Select this option to join two datastreams and keep all the data in the data extract of the joining datastream (the right datastream). Any rows with non-matching key columns are removed from the data extract of the datastream to which the join transformation is applied (the left datastream).
-
Inner - Select this option to keep only the columns with matching rows from both data extracts. These are the columns defined in the Key field.
-
Outer - Select this option to keep all the rows from both of the data extracts to join.
-
Interval - Select this option to join only a selection of the data extracts of the joining datastreams. Enter the range for both the left and right datastreams. Any matching rows that are present in both of these ranges are joined using this transformation. Enter the start and end of the range in the Left/Right Start and Left/Right Stop fields.
-
Leftinterval - Select this option to perform the same join as the Interval join described above. However, this option will keep all the non-matching rows in the data extract of the left datastream.
-
-
Keys (or Left key and Right key) (required)
-
Specify the columns found in both data extracts that you want to use to match the data. To add multiple keys to match, click . Choose one of the following options:
-
Select String and enter the name of the columns. This option only works if both columns have the same name in their corresponding data extracts.
-
Select Integer and enter the position of the column in the data extract as an index value. Counting starts at 0. To select the first column, enter
0
. To select the second column, enter1
. To select the last column in the data extract, enter-1
. This option only works if both columns are in the same position in their respective data extracts. -
If the columns have different names in their respective data extracts, use the Left key and Right key fields:
-
In Left key, enter the name of the column in the datastream to which the transformation is applied.
-
In Right key, enter the name of the column in the joining datastream.
-
-
-
Streams (required)
-
Enter the name of the datastream that you want to join to the datastream. To join multiple datastreams, click and enter the names of the datastreams to join.
-
Columns
-
Enter the name of the columns in the right datastream (this is the joining datastream) to be joined. To join every column from the right datastream, leave this field blank and select the Join all field.
-
Distinct join
-
Select this field to prevent multiple identical rows in the key columns from being joined. When selected, if multiple rows in the right datastream (this is the joining datastream) match a single row in the left datastream, then only the first row from the right datastream is joined.
-
Join all
-
Select this field to join all the completed data extracts from the selected datastreams.
-
Concatenate
-
If you are joining multiple datastreams, select this field to remove columns that have the same name.
-
Right select
-
Enter a python expression to find and pre-select rows in the right datastream before the join starts.
-
Right prefix
-
Enter a value to prefix on to the column names that are to be joined. For example, if you enter the value
joined-
then all the names of the columns joined into the left datastream (such asclicks
) would be renamed (such as tojoined-clicks
). -
Right presorted
-
Select this field if the data extract of the right datastream (this is the joining datastream) has been sorted. This will speed up the transformation.
-
Include tags
-
Enter the names of the tags used to identify which data extracts are to be joined. If you populate this field, only the data extracts with the selected tags are merged. Leave this field blank to join only the most recent data extracts from the selected datastreams.
-
To add tags to a data extract, use the set_tags transformation.
-
Exclude tags
-
Enter the names of the tags used to identify which data extracts should not be joined. If you enter tags in this field, they will have priority over the tags entered in the Include Tags field.
-
Convert keys
-
Select this field to convert keys to unicode.
-
Ignore join date
-
Leave this field unselected to join only the data extracts with a scheduled_date that is earlier than the current_extract_date. The Include Tags and Exclude Tags fields have priority over the Ignore Join Date field.
-
Left start
-
Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the transformation is applied) used to identify the start of the interval.
-
Left stop
-
Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the transformation is applied) used to identify the end of the interval.
-
Right start
-
Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the start of the interval.
-
Right stop
-
Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the end of the interval.
-
Subtable
-
Enter the name for a subtable that you want to use within this custom script.
A subtable is a temporary table that only exists for this custom script. You can apply additional instructions within the same custom script to the subtable. However, the subtable cannot be used in any other custom scripts.
If a subtable does not exist for the current custom script, the transformation is applied to the data extract, and the enriched data is output into the subtable. If the subtable already exists for the custom script, the subtable is used as the input for the transformation and optionally as the output.
Example configuration
In this example, we have data extracts from a Facebook Ads datastream and a Google Ads datastream. Both data extracts contain the same identifying value in a column called Product ID, as well as different product name and clicks values in columns with different names.
We want to use the product ID values to join these data extracts, and add the clicks values from our Google Ads data extract to our Facebook Ads data extract. We do not want to include the product names from our Google Ads data extract in the joined data extract.
To join the latest data extracts from these datastreams under these conditions, the join instruction would be configured as follows:
-
Go to the Transformations page.
-
Click + Create transformation.
-
In the Type step, select Custom script.
-
In the Assignments step, select the first datastream.
In this example, this is our Facebook Ads datastream.
-
In the Instructions step, select the join instruction.
-
In How, select how you would like to join the two data extracts.
In this example, we will use a left join, because we want to add the data from the right datastream (Google Ads) to the left datastream (Facebook Ads).
-
In Keys, click + and enter the name of the column found in both data extracts that contains the matching values that Adverity will use to join the data extracts.
In this example, this column is called Product ID in both data extracts. If the matching values were in columns with different names in each data extract, we would use the Left key and Right key fields, as described above.
-
In the Streams field, click + and select the second datastream.
In this example, this is our Google Ads datastream.
-
In the Columns field, enter the name of the additional columns to join.
In this example, we will enter Clicks because we only want to include this column from our Google Ads data extract in the joined data extract.
Alternatively, if we wanted to include all additional columns in the joined data extract, we could leave this field empty and instead select the Join all checkbox.
-
Leave the other fields on their default selections.
As a result, we have created a transformation using the join custom script that will join the data from our Facebook Ads and Google Ads datastreams.
Data extract before transformation: Left datastream
Before running the transformation, the Facebook Ads data extract contains the following data.
Product ID |
Name |
Total clicks |
---|---|---|
A100 |
Apple |
5 |
B100 |
Banana |
12 |
C100 |
Cherry |
30 |
Data extract before transformation: Right datastream
Before running the transformation, the Google Ads data extract contains the following data.
Product ID |
Product name |
Clicks |
---|---|---|
A100 |
Fruit_Apple |
50 |
C100 |
Fruit_Cherry |
60 |
C200 |
Vegetable_Carrot |
90 |
Data extract after transformation: Left datastream containing joined data
After running the transformation, the Facebook Ads data extract contains the following data.
Product ID |
Name |
Total clicks |
Clicks |
---|---|---|---|
A100 |
Apple |
5 |
50 |
B100 |
Banana |
12 |
|
C100 |
Cherry |
30 |
60 |
For all rows in the Facebook Ads data extract, a value in the Clicks column has been added if there was a row in the Google Ads data extract with a matching value in the Product ID column.
Any rows in the Google Ads data extract with a value in the Product ID column that does not match a value in the Product ID column of the Facebook Ads data extract are not joined.