Choosing and combining custom script instructions#
This guide explains how to choose the right custom script instruction for your transformation goal and how to combine instructions in real-world pipelines. For detailed configuration of each instruction, see Available custom script instructions.
Choosing the right custom script instruction#
Before you start writing Python code, decide which custom script instruction is the best fit for the task. In many cases, different instructions can reach a similar result, but with different trade-offs for performance, maintainability, and configuration effort.
The table below gives common transformation goals and the most suitable instructions to start with.
Transformation goal |
Recommended custom script instructions |
Notes and prerequisites |
|---|---|---|
Create or harmonize a field |
For harmonizing values (mapping source values to target values), use mapping tables with the map instruction — this keeps your logic centralized and reusable. Use addfieldx when you need a Python expression to combine or transform fields. Use addfield for a fixed value. Use metaaddfield when the value can be derived from metadata across multiple datastreams. |
|
Convert values to numbers or dates |
Always convert text to numeric or datetime types before using numeric/date comparisons or aggregations. For numeric values, apply convertnumbers first and then use instructions such as selectgt or aggregate. |
|
Filter or segment rows |
Use select with a Python expression for flexible conditions (including RegEx). Use selectin or selecteq when you have one or more explicit values. When filtering on numeric columns, ensure you have applied convertnumbers if the source data is not already numeric. |
|
Aggregate and reshape data |
Use aggregate to reduce row-level detail to business-level granularity (for example per day, campaign, channel). Use sumup to add running totals, and melt when you need to pivot columns into rows. Most aggregations require numeric values, so use convertnumbers first. |
|
Join or enrich from other sources |
Use join to enrich a data extract with additional columns from other datastreams. Use append when you want to stack rows from another source. When working with nested or file-based structures, use zipload, unpack, unpackdict or unpacklist first to get the data into tabular form. |
|
Clean and transform text |
Use convertx for free-form Python expressions (for example lowercasing, trimming, or RegEx replacements). Use splitfield to split structured strings into multiple columns and cut / cutout / their |
|
Use metadata or mapping tables |
Prefer map with a mapping table or metaaddfield over hard-coded dictionaries in Python when you need reusable, centrally managed mappings. |
|
Validate or check data quality |
Use these instructions to add checks around your transformations (for example enforcing uniqueness or checking headers) instead of encoding such checks only in Python expressions. |
When a custom script is not the best option#
Sometimes a custom script is not the most efficient or maintainable way to solve a problem. Before creating a new custom script, consider whether one of the following is a better fit:
Mapping tables for value-to-value harmonization and any mapping logic that needs to be reused across datastreams.
Standard transformations such as Create or Edit Columns for simple, fixed-value column additions.
Connector configuration or bundle datastreams when the required data shaping can be done at source.
Using a simpler configuration or an out-of-the-box transformation can make your data flows easier to maintain and avoid duplicating the same logic in multiple custom scripts.
Common use-case patterns#
This section shows how to apply custom script instructions to common data transformation problems. Each pattern describes when to use the instructions and how to sequence them.
Harmonizing and constructing fields#
Use this pattern when you need to create new business fields or harmonize fields coming from multiple data sources. For value-to-value harmonization (for example mapping platform-specific names or codes to a common business label), use mapping tables with the map instruction rather than hard-coding mappings in Python dictionaries.
Typical instructions:
- Example: Create a canonical date field across platforms
Goal: You receive date information in different columns depending on the platform (for example
date,date_start,Day), and you want a single canonicalDatefield.When to use: Use addfieldx when the logic for selecting the correct source column depends on the platform, or when you need to combine multiple fields into one.
Pattern:
Use addfieldx to create a new
Datecolumn.In Python expression, use conditional logic (and optionally
refor pattern matching) to choose the correct source column for each platform.Optionally follow with convertdates or convertx to standardize the date format.
- Example: Reusable platform field using metadata
Goal: Add a
Platformfield that is consistent across many datastreams.When to use: Use metaaddfield when the value can be derived from metadata (for example datastream or connector information), and you want a single configuration instead of many similar custom scripts.
Pattern:
Configure metaaddfield with a suitable Fieldname (for example
Platform).Select Metaname to read from the metaheader (for example the connector name).
Use mapping tables or map if you need to convert raw platform names into harmonized labels.
Data type conversions and number handling#
Many advanced custom script setups depend on correct data types. Use this pattern whenever you apply numeric comparisons, aggregations, or date logic.
Typical instructions:
- Example: Prepare metrics for filtering and aggregation
Goal: Filter out rows with small values and aggregate the remaining rows.
When to use: Use convertnumbers before any numeric select* or aggregate instruction when your metrics are stored as text.
Pattern:
Apply convertnumbers to all metric columns that must be treated as numbers.
Use selectgt or another select instruction to keep rows above a numeric threshold.
Use aggregate (and optionally sumup) to aggregate the filtered metrics.
- Example: Normalize currencies for reporting
Goal: Report all spend in a single target currency while source data comes in multiple currencies.
When to use: Use currency when you need automatic rate handling and a clear separation between amount and currency columns.
Pattern:
Make sure your amount field is numeric. If not, apply convertnumbers first.
Configure currency with Field Name, Source Currency, Target Currency and (if applicable) a Currency Field Name that contains ISO currency codes.
Optionally aggregate the converted amounts using aggregate.
Filtering, segmenting, and quality checks#
Use these patterns when your main goal is to keep or remove rows based on business rules or data quality conditions.
Typical instructions:
select and related select* instructions
verifyunique for uniqueness checks
- Example: Keep only campaigns for a specific brand family
Goal: Keep only rows where campaign names belong to a given brand family.
When to use: Use select with a Python expression when the conditions are more complex than simple equality checks, or when you want to use RegEx.
Pattern:
- Example: Remove rows with invalid or missing metrics
Goal: Guarantee that downstream analytics only operate on valid numeric values.
When to use: Use convertnumbers followed by a select instruction depending on whether you want to keep or remove rows with missing or zero values.
Pattern:
Apply convertnumbers to the metric columns.
To remove rows with empty values before aggregation, use selectnotnone.
To keep only rows where the metric is empty,
nullor0(for example for data quality audits), use selectfalse instead.
Aggregation, reshaping, and de-duplication#
Use these patterns when you want to change the granularity of your data extract or produce summary tables.
Typical instructions:
- Example: Daily campaign performance table
Goal: Reduce a very granular data extract (for example per-ad or per-keyword) to a daily campaign summary.
When to use: Use aggregate when you want to group by key fields (for example
DateandCampaign) and apply calculations such as sum or average to metrics.Pattern:
Ensure metric fields are numeric with convertnumbers.
Apply aggregate using Key fields (for example
Date,Campaign) and Rulesets that define output fields and metrics.Optionally use sumup afterwards to add cumulative totals.
- Example: Pivot metrics from columns into rows
Joining and enriching from other data sources#
Use this pattern when you want to combine multiple datastreams or enrich a fact table with lookup information.
Typical instructions:
- Example: Enrich campaign performance with lookup attributes
Goal: Add campaign-level metadata (for example business unit, owner, region) from a lookup table to a performance data extract.
When to use: Use join when you have a stable identifier (for example
Campaign ID) shared between the performance table and the lookup table.Pattern:
If the lookup table is stored in a nested structure (for example JSON), first use unpack or unpackdict to flatten it.
Configure join with Keys that identify matching rows (for example
Campaign ID) and Streams that reference the lookup datastream.Use Columns and Join all to control which attributes are added to the fact table.
- Example: Stack historical or external data
Goal: Combine rows from multiple periods or sources into a single table with the same schema.
When to use: Use append when schemas are compatible and you just need to union rows; use join when you need to match rows by a shared key rather than simply stacking them.
Pattern:
Text and RegEx-heavy transformations#
Use these patterns when your main task is to parse or clean text.
Typical instructions:
- Example: Split composite names into components
Goal: Split a campaign name like
Brand/Region/Channelinto separate fields.When to use: Use splitfield when the separator or format is stable and you want multiple new columns.
Pattern:
Configure splitfield with Field, a Pattern that describes the separator, and Newfields for the output columns.
Optionally follow with convertx to trim or normalize the resulting text columns.
- Example: Clean up free-form text
Goal: Remove special characters or normalize casing for analysis.
When to use: Use convertx with Python string methods or
re.subwhen you need flexible text manipulation.Pattern:
Apply convertx with an expression such as
re.sub(r'[^\\w\\s]', '', {column_name}).lower().For more complex matching, see Using RegEx in Python expressions and Using Python expressions to process text columns.
Metadata-driven and cross-stream logic#
Use this pattern when you want configuration to drive behaviour, so that you can reuse the same logic across many datastreams.
Typical instructions:
- Example: Apply a transformation only for certain tagged extracts
Goal: Run an additional step (for example an extra filter) only when a data extract has a specific tag or metadata flag.
When to use: Use set_meta and if together when you need conditional execution based on metadata rather than row-level data.
Pattern:
Advanced multi-step scenarios#
This section shows complete flows that combine several custom script instructions. Each scenario focuses on when to use a particular step, not just how to configure it.
Scenario 1: Normalize metrics and aggregate for reporting#
- Goal
Create a clean, aggregated table of performance metrics at a reporting level (for example daily per campaign), where all numeric values are comparable and in a single target currency.
Typical steps
Convert numeric text to numbers
Use convertnumbers on all metric columns (for example
Clicks,Impressions,Cost).This prevents subtle issues where
'100'and'20'are compared lexicographically instead of numerically.
Normalize currencies
If data is in multiple currencies, use currency to convert to a single target currency (for example
EUR).Ensure you provide a Currency Field Name when using Automatic source currency detection.
Filter invalid or out-of-scope rows
Aggregate to the desired level
Use aggregate with Key fields such as
DateandCampaign, and Rulesets that sum or average the normalized metrics.
- When to use this scenario
Use this pattern whenever downstream dashboards or modeling require a consistent numeric base, especially in multi-currency, multi-source setups.
Scenario 2: Harmonize dates from multiple platforms#
- Goal
Produce a single, reliable
Datefield from sources that expose different date columns or formats.
Typical steps
Create a canonical date field
Use addfieldx to add a new column (for example
Date).In Python expression, choose the correct source field by platform, such as:
{date}for one platform,{Day}for another, and{date_start}for a third platform.
Standardize the date format
If all date values are already in a consistent ISO-like format, you can slice them with convertx (for example
{Date}[:10]).If formats differ or you need robust parsing, use convertdates and choose an appropriate Format or Auto.
Use the canonical date field everywhere else
Reference only the new
Datefield in downstream filters, joins, and aggregations.
- When to use this scenario
Use this pattern whenever you integrate multiple advertising or analytics platforms and want to avoid repeating platform-specific date logic in many places.
Scenario 3: Complex filtering and classification with RegEx#
- Goal
Classify rows into segments (for example brand vs non-brand, product categories, regions) using flexible pattern matching on text fields.
Typical steps
Normalize the text column
Use convertx to lower-case and trim the text (for example campaign names) to reduce the complexity of your RegEx conditions.
Example expression:
{Campaign}.strip().lower()
Filter or classify using RegEx
Optionally combine with numeric filters
After classification, you can apply select* or aggregate to work only with specific segments.
- When to use this scenario
Use this pattern when naming conventions are not strictly standardized and you need flexible matching beyond simple equality checks.
Scenario 4: Enrich from lookup tables with missing-match handling#
- Goal
Enrich fact data (for example performance metrics) with attributes from external lookup tables (for example product catalogs or cost centers), with explicit control over how missing or duplicate key matches are handled.
Typical steps
Prepare the lookup data
If the lookup is nested (for example JSON in a single column), use unpack or unpackdict to flatten it into tabular form.
Make sure the key column (for example
Product ID) is clean and unique; optionally verify this with verifyunique.
Configure the join
Use join with an appropriate How option (for example Left if you want to keep all fact rows).
Set Keys (or Left key and Right key) to the shared identifier.
In Streams, select the lookup datastream.
In Columns, list only the attributes you want to bring into the fact table, or use Join all if all attributes are needed.
Handle missing or duplicate matches
Use Distinct join when you need to avoid multiple rows on the right-hand side.
After the join, use extendmissing or another clean-up instruction to handle missing attributes.
- When to use this scenario
Use this pattern whenever you have central reference tables (for example product, region, or customer master data) that you want to join to many different fact datastreams.
Performance, maintainability, and Transformation Copilot considerations#
When designing custom scripts, keep the following principles in mind:
Prefer simple, linear sequences of instructions over a single extremely complex step. Multiple clear custom script instructions are usually easier to debug and maintain than one dense Python expression.
Within each custom script, place type conversions such as convertnumbers and convertdates before any instruction that depends on those types. Type conversions are local to each custom script — a separate custom script that needs numeric operations must include its own conversion step.
Use mapping tables, metadata, and reusable lookup structures instead of hard-coding long lists or dictionaries in Python expressions. This makes your logic easier to change without editing code.
Consider subtables and helper instructions (such as unpack or zipload) when working with large or nested data sets, instead of materializing everything into a single, wide data extract.
If you use the Transformation Copilot to help configure custom scripts, remember that it works instruction by instruction. Clearly separated steps and well-chosen instructions make it easier for the Copilot to suggest useful configurations and for your team to understand and maintain the resulting transformations.