Join Datastream

This guide explains how to use the Join Datastream transformation to combine two datastreams.

Introduction

The Join Datastream transformation lets you include data from one datastream in another datastream. These datastreams must have at least one column in common, as this is the column by which the data is joined. This transformation is a left join. To join datastreams in other ways, create a custom script transformation that uses the join instruction.

In this process, no data is deleted from the datastreams. Data is copied from a selected datastream (the "right datastream") into the datastream to which you assign the transformation (the "left datastream"). The original datastream remains unchanged.

This video guide explains how to create and configure a Join Datastreamtransformation.

Limitations

The Join Datastream transformation joins data extracts from selected datastreams by including data from "Datastream A" (the datastream you select when configuring the transformation) in "Datastream B" (the datastream to which you assign the transformation). The destination settings of Datastream B determine which data extract is included in Datastream A.

  • If Datastream B does not have a destination enabled, then the latest data extract with a status of either Collected or Loaded can be joined into Datastream A.

  • If Datastream B does have a destination enabled, then only the latest data extract with a status of Loaded can be joined into Datastream A.

The latest data extract is determined based on the datastream_extract_scheduled metadata value of a data extract. For manual fetches, the time in datastream_extract_scheduled is set to 00:00:00. If both scheduled and manual fetches were performed on the same day, the data extract from the scheduled fetch will be taken as the latest data extract even if there is a more recent data extract from a manual fetch.

Prerequisites

Before you complete the procedure in this guide, perform all of the following actions:

  • Ensure both datastreams share at least one column with corresponding data. For example, both datastreams contain Ad ID or Stock Keeping Units (SKU).

Configuring the Join Datastream transformation

This guide explains how to configure the Join Datastream transformation. Before completing this step, start creating a transformation by following the instructions in Using standard transformations. To configure this transformation, follow these steps:

  1. In the Datastream to join drop-down menu, select the datastream which contains the columns to be joined to the datastream or datastreams to which you assigned the transformation in the Assignments step.

  2. In the Select the common columns to join drop-down menus, select the columns that contain the corresponding data in both datastreams. These columns may have different names.

  3. In Select additional columns to include from {datastream-name}, choose which additional columns to join.

You can preview your transformation in the Preview tab. Here, you can see how your current transformation instructions will transform the data you have previously collected using the selected datastream.

Click Refresh Preview at the bottom of the Configuration tab to see the preview of your latest changes.

For more information about the preview, see Previewing transformations.

Example configuration

This example demonstrates how to join data from two specific datastreams.

This example assumes the following conditions:

  • The user has two Shopify datastreams. One datastream uses the Orders report type, and the other uses the Customers report type.

  • The user has fetched data using both datastreams and both data extracts contain an ID field.

    In the Orders datastream this is called id, and in the Customers datastream this is called last_order_id.

  • The user wants to add data from the Orders datastream to the Customers datastream. The data the user wants to include is the total cost of the customer's last order.

To join these datastreams as described above, the user must follow these steps:

  1. In Assign to, select Individual datastreams.

  2. Type in and select the Shopify datastream using the Customers report type.

  1. Click Next.

  1. In Datastream to join, type in and select the Shopify datastream using the Orders report type.

  2. Under Select the common columns to join, in the left drop-down menu, select last_order_id.

  3. In the right drop-down menu, select id.

  4. Under Select additional columns to include, in Select columns, select total_price.

As a result, the ID data in the id column in the data extract collected using the Orders report type is matched to the data in the last_order_id column collected using the Customers report type. The last_order_id and id columns were used to correctly add the column total_price to the data collected using the Customers report type.