Choosing and combining custom script instructions

Choosing and combining custom script instructions#

This guide explains how to choose the right custom script instruction for your transformation goal and how to combine instructions in real-world pipelines. For detailed configuration of each instruction, see Available custom script instructions.

Choosing the right custom script instruction#

Before you start writing Python code, decide which custom script instruction is the best fit for the task. In many cases, different instructions can reach a similar result, but with different trade-offs for performance, maintainability, and configuration effort.

The table below gives common transformation goals and the most suitable instructions to start with.

Transformation goal	Recommended custom script instructions	Notes and prerequisites
Create or harmonize a field	addfieldx, addfield, metaaddfield, map	For harmonizing values (mapping source values to target values), use mapping tables with the map instruction — this keeps your logic centralized and reusable. Use addfieldx when you need a Python expression to combine or transform fields. Use addfield for a fixed value. Use metaaddfield when the value can be derived from metadata across multiple datastreams.
Convert values to numbers or dates	convertnumbers, convertdates, convertx	Always convert text to numeric or datetime types before using numeric/date comparisons or aggregations. For numeric values, apply convertnumbers first and then use instructions such as selectgt or aggregate.
Filter or segment rows	select, selectin, selecteq, selectgt, selectlt	Use select with a Python expression for flexible conditions (including RegEx). Use selectin or selecteq when you have one or more explicit values. When filtering on numeric columns, ensure you have applied convertnumbers if the source data is not already numeric.
Aggregate and reshape data	aggregate, sumup, distinct, melt	Use aggregate to reduce row-level detail to business-level granularity (for example per day, campaign, channel). Use sumup to add running totals, and melt when you need to pivot columns into rows. Most aggregations require numeric values, so use convertnumbers first.
Join or enrich from other sources	join, append, unpack, zipload	Use join to enrich a data extract with additional columns from other datastreams. Use append when you want to stack rows from another source. When working with nested or file-based structures, use zipload, unpack, unpackdict or unpacklist first to get the data into tabular form.
Clean and transform text	convertx, splitfield, cut, cutout, cutre, cutoutre	Use convertx for free-form Python expressions (for example lowercasing, trimming, or RegEx replacements). Use splitfield to split structured strings into multiple columns and cut / cutout / their `*re` variants to keep or remove specific columns. For more examples, see Using RegEx in Python expressions and Using Python expressions to process text columns.
Use metadata or mapping tables	map, metaaddfield, set_meta	Prefer map with a mapping table or metaaddfield over hard-coded dictionaries in Python when you need reusable, centrally managed mappings.
Validate or check data quality	verifyunique, expectheader, conflicts	Use these instructions to add checks around your transformations (for example enforcing uniqueness or checking headers) instead of encoding such checks only in Python expressions.

When a custom script is not the best option#

Sometimes a custom script is not the most efficient or maintainable way to solve a problem. Before creating a new custom script, consider whether one of the following is a better fit:

Mapping tables for value-to-value harmonization and any mapping logic that needs to be reused across datastreams.
Standard transformations such as Create or Edit Columns for simple, fixed-value column additions.
Connector configuration or bundle datastreams when the required data shaping can be done at source.

Using a simpler configuration or an out-of-the-box transformation can make your data flows easier to maintain and avoid duplicating the same logic in multiple custom scripts.

Common use-case patterns#

This section shows how to apply custom script instructions to common data transformation problems. Each pattern describes when to use the instructions and how to sequence them.

Harmonizing and constructing fields#

Use this pattern when you need to create new business fields or harmonize fields coming from multiple data sources. For value-to-value harmonization (for example mapping platform-specific names or codes to a common business label), use mapping tables with the map instruction rather than hard-coding mappings in Python dictionaries.

Typical instructions:

Example: Create a canonical date field across platforms

Goal: You receive date information in different columns depending on the platform (for example date, date_start, Day), and you want a single canonical Date field.
When to use: Use addfieldx when the logic for selecting the correct source column depends on the platform, or when you need to combine multiple fields into one.
Pattern:
1. Use addfieldx to create a new Date column.
2. In Python expression, use conditional logic (and optionally re for pattern matching) to choose the correct source column for each platform.
3. Optionally follow with convertdates or convertx to standardize the date format.

Example: Reusable platform field using metadata

Goal: Add a Platform field that is consistent across many datastreams.
When to use: Use metaaddfield when the value can be derived from metadata (for example datastream or connector information), and you want a single configuration instead of many similar custom scripts.
Pattern:
1. Configure metaaddfield with a suitable Fieldname (for example Platform).
2. Select Metaname to read from the metaheader (for example the connector name).
3. Use mapping tables or map if you need to convert raw platform names into harmonized labels.

Data type conversions and number handling#

Many advanced custom script setups depend on correct data types. Use this pattern whenever you apply numeric comparisons, aggregations, or date logic.

Typical instructions:

Example: Prepare metrics for filtering and aggregation

Goal: Filter out rows with small values and aggregate the remaining rows.
When to use: Use convertnumbers before any numeric select* or aggregate instruction when your metrics are stored as text.
Pattern:
1. Apply convertnumbers to all metric columns that must be treated as numbers.
2. Use selectgt or another select instruction to keep rows above a numeric threshold.
3. Use aggregate (and optionally sumup) to aggregate the filtered metrics.

Example: Normalize currencies for reporting

Goal: Report all spend in a single target currency while source data comes in multiple currencies.
When to use: Use currency when you need automatic rate handling and a clear separation between amount and currency columns.
Pattern:
1. Make sure your amount field is numeric. If not, apply convertnumbers first.
2. Configure currency with Field Name, Source Currency, Target Currency and (if applicable) a Currency Field Name that contains ISO currency codes.
3. Optionally aggregate the converted amounts using aggregate.

Filtering, segmenting, and quality checks#

Use these patterns when your main goal is to keep or remove rows based on business rules or data quality conditions.

Typical instructions:

select and related select* instructions
selectnone / selectnotnone / selectfalse
verifyunique for uniqueness checks

Example: Keep only campaigns for a specific brand family

Goal: Keep only rows where campaign names belong to a given brand family.
When to use: Use select with a Python expression when the conditions are more complex than simple equality checks, or when you want to use RegEx.
Pattern:
1. If campaign naming is inconsistent, normalize capitalization or spacing first with convertx.
2. Apply select with a Where expression such as re.match('^Brand_', {Campaign}).

Example: Remove rows with invalid or missing metrics

Goal: Guarantee that downstream analytics only operate on valid numeric values.
When to use: Use convertnumbers followed by a select instruction depending on whether you want to keep or remove rows with missing or zero values.
Pattern:
1. Apply convertnumbers to the metric columns.
2. To remove rows with empty values before aggregation, use selectnotnone.
3. To keep only rows where the metric is empty, null or 0 (for example for data quality audits), use selectfalse instead.

Aggregation, reshaping, and de-duplication#

Use these patterns when you want to change the granularity of your data extract or produce summary tables.

Typical instructions:

Example: Daily campaign performance table

Goal: Reduce a very granular data extract (for example per-ad or per-keyword) to a daily campaign summary.
When to use: Use aggregate when you want to group by key fields (for example Date and Campaign) and apply calculations such as sum or average to metrics.
Pattern:
1. Ensure metric fields are numeric with convertnumbers.
2. Apply aggregate using Key fields (for example Date, Campaign) and Rulesets that define output fields and metrics.
3. Optionally use sumup afterwards to add cumulative totals.

Example: Pivot metrics from columns into rows

Goal: Reshape a wide table into a long format for downstream tools.
When to use: Use melt when you want to turn multiple metric columns into key/value rows.
Pattern:
1. Apply melt to move selected columns into rows.
2. Use recast later if you need to pivot back.

Joining and enriching from other data sources#

Use this pattern when you want to combine multiple datastreams or enrich a fact table with lookup information.

Typical instructions:

Example: Enrich campaign performance with lookup attributes

Goal: Add campaign-level metadata (for example business unit, owner, region) from a lookup table to a performance data extract.
When to use: Use join when you have a stable identifier (for example Campaign ID) shared between the performance table and the lookup table.
Pattern:
1. If the lookup table is stored in a nested structure (for example JSON), first use unpack or unpackdict to flatten it.
2. Configure join with Keys that identify matching rows (for example Campaign ID) and Streams that reference the lookup datastream.
3. Use Columns and Join all to control which attributes are added to the fact table.

Example: Stack historical or external data

Goal: Combine rows from multiple periods or sources into a single table with the same schema.
When to use: Use append when schemas are compatible and you just need to union rows; use join when you need to match rows by a shared key rather than simply stacking them.
Pattern:
1. If the schemas do not match, apply rename or reorder first to align column names and order.
2. Use append to add rows from another datastream or subtable.

Text and RegEx-heavy transformations#

Use these patterns when your main task is to parse or clean text.

Typical instructions:

Example: Split composite names into components

Goal: Split a campaign name like Brand/Region/Channel into separate fields.
When to use: Use splitfield when the separator or format is stable and you want multiple new columns.
Pattern:
1. Configure splitfield with Field, a Pattern that describes the separator, and Newfields for the output columns.
2. Optionally follow with convertx to trim or normalize the resulting text columns.

Example: Clean up free-form text

Goal: Remove special characters or normalize casing for analysis.
When to use: Use convertx with Python string methods or re.sub when you need flexible text manipulation.
Pattern:
1. Apply convertx with an expression such as re.sub(r'[^\\w\\s]', '', {column_name}).lower().
2. For more complex matching, see Using RegEx in Python expressions and Using Python expressions to process text columns.

Metadata-driven and cross-stream logic#

Use this pattern when you want configuration to drive behaviour, so that you can reuse the same logic across many datastreams.

Typical instructions:

Example: Apply a transformation only for certain tagged extracts

Goal: Run an additional step (for example an extra filter) only when a data extract has a specific tag or metadata flag.
When to use: Use set_meta and if together when you need conditional execution based on metadata rather than row-level data.
Pattern:
1. Use set_meta earlier in the flow to add the relevant metadata key/value.
2. Apply if to check the metadata condition and run a nested custom script only if the condition is met.

Advanced multi-step scenarios#

This section shows complete flows that combine several custom script instructions. Each scenario focuses on when to use a particular step, not just how to configure it.

Scenario 1: Normalize metrics and aggregate for reporting#

Goal: Create a clean, aggregated table of performance metrics at a reporting level (for example daily per campaign), where all numeric values are comparable and in a single target currency.

Typical steps

Convert numeric text to numbers
- Use convertnumbers on all metric columns (for example Clicks, Impressions, Cost).
- This prevents subtle issues where '100' and '20' are compared lexicographically instead of numerically.
Normalize currencies
- If data is in multiple currencies, use currency to convert to a single target currency (for example EUR).
- Ensure you provide a Currency Field Name when using Automatic source currency detection.
Filter invalid or out-of-scope rows
- Optionally use select or selectgt to keep only rows that meet minimum quality or spend thresholds.
Aggregate to the desired level
- Use aggregate with Key fields such as Date and Campaign, and Rulesets that sum or average the normalized metrics.

When to use this scenario: Use this pattern whenever downstream dashboards or modeling require a consistent numeric base, especially in multi-currency, multi-source setups.

Scenario 2: Harmonize dates from multiple platforms#

Goal: Produce a single, reliable Date field from sources that expose different date columns or formats.

Typical steps

Create a canonical date field
- Use addfieldx to add a new column (for example Date).
- In Python expression, choose the correct source field by platform, such as:
  
  {date} for one platform, {Day} for another, and {date_start} for a third platform.
Standardize the date format
- If all date values are already in a consistent ISO-like format, you can slice them with convertx (for example {Date}[:10]).
- If formats differ or you need robust parsing, use convertdates and choose an appropriate Format or Auto.
Use the canonical date field everywhere else
- Reference only the new Date field in downstream filters, joins, and aggregations.

When to use this scenario: Use this pattern whenever you integrate multiple advertising or analytics platforms and want to avoid repeating platform-specific date logic in many places.

Scenario 3: Complex filtering and classification with RegEx#

Goal: Classify rows into segments (for example brand vs non-brand, product categories, regions) using flexible pattern matching on text fields.

Typical steps

Normalize the text column
- Use convertx to lower-case and trim the text (for example campaign names) to reduce the complexity of your RegEx conditions.
- Example expression:
  
  {Campaign}.strip().lower()
Filter or classify using RegEx
- For filtering, use select with a Where expression such as:
  
  re.search(r'brand|branded', {Campaign})
- For classification, use addfieldx to create a new segment column using conditional logic with RegEx.
Optionally combine with numeric filters
- After classification, you can apply select* or aggregate to work only with specific segments.

When to use this scenario: Use this pattern when naming conventions are not strictly standardized and you need flexible matching beyond simple equality checks.

Scenario 4: Enrich from lookup tables with missing-match handling#

Goal: Enrich fact data (for example performance metrics) with attributes from external lookup tables (for example product catalogs or cost centers), with explicit control over how missing or duplicate key matches are handled.

Typical steps

Prepare the lookup data
- If the lookup is nested (for example JSON in a single column), use unpack or unpackdict to flatten it into tabular form.
- Make sure the key column (for example Product ID) is clean and unique; optionally verify this with verifyunique.
Configure the join
- Use join with an appropriate How option (for example Left if you want to keep all fact rows).
- Set Keys (or Left key and Right key) to the shared identifier.
- In Streams, select the lookup datastream.
- In Columns, list only the attributes you want to bring into the fact table, or use Join all if all attributes are needed.
Handle missing or duplicate matches
- Use Distinct join when you need to avoid multiple rows on the right-hand side.
- After the join, use extendmissing or another clean-up instruction to handle missing attributes.

When to use this scenario: Use this pattern whenever you have central reference tables (for example product, region, or customer master data) that you want to join to many different fact datastreams.

Performance, maintainability, and Transformation Copilot considerations#

When designing custom scripts, keep the following principles in mind:

Prefer simple, linear sequences of instructions over a single extremely complex step. Multiple clear custom script instructions are usually easier to debug and maintain than one dense Python expression.
Within each custom script, place type conversions such as convertnumbers and convertdates before any instruction that depends on those types. Type conversions are local to each custom script — a separate custom script that needs numeric operations must include its own conversion step.
Use mapping tables, metadata, and reusable lookup structures instead of hard-coding long lists or dictionaries in Python expressions. This makes your logic easier to change without editing code.
Consider subtables and helper instructions (such as unpack or zipload) when working with large or nested data sets, instead of materializing everything into a single, wide data extract.

If you use the Transformation Copilot to help configure custom scripts, remember that it works instruction by instruction. Clearly separated steps and well-chosen instructions make it easier for the Copilot to suggest useful configurations and for your team to understand and maintain the resulting transformations.

Choosing and combining custom script instructions

Contents

Choosing and combining custom script instructions#

Choosing the right custom script instruction#

When a custom script is not the best option#

Common use-case patterns#

Harmonizing and constructing fields#

Data type conversions and number handling#

Filtering, segmenting, and quality checks#

Aggregation, reshaping, and de-duplication#

Joining and enriching from other data sources#

Text and RegEx-heavy transformations#

Metadata-driven and cross-stream logic#

Advanced multi-step scenarios#

Scenario 1: Normalize metrics and aggregate for reporting#

Scenario 2: Harmonize dates from multiple platforms#

Scenario 3: Complex filtering and classification with RegEx#

Scenario 4: Enrich from lookup tables with missing-match handling#

Performance, maintainability, and Transformation Copilot considerations#