Using RegEx in Python expressions#

This guide explains how to use regular expressions to transform data in a data extract.

Introduction#

Use Python expressions in custom scripts to transform your data extract. For a list of all available custom script transformations, see Available custom script instructions.

When using Python expressions in custom scripts, you must follow certain rules - these rules are covered in more detail in Using Python expressions in custom scripts.

To learn more about regular expressions, see the following links:

In Adverity, curly braces {} are reserved for column name placeholders. If your regular expression uses curly braces, update it to use square brackets instead. For example, use [a-z][2] instead of [a-z]{2}.

Using RegEx in Python expressions#

To use a regular expression in a Python expression within a custom script, use the re Python module.

By default, regular expression matching is case-sensitive. If you want to perform non case-sensitive matching, first convert your column to lower or uppercase in the Python expression. For example:

{column_name}.lower()
{column_name}.upper()

Below are some examples of use cases of using RegEx in Python expressions to transform your data.

Filtering data with RegEx#

To keep only the elements in a column that match a specific regex pattern, use re.match() inside your conditional Python expression:

'positive_output_value' if re.match('pattern', {column_name}) else 'negative_output_value'

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to filter.

  • pattern - This is the regex pattern used for filtering.

  • positive_output_value and negative_output_value - These are target values that you want to use based on data filtering.

For example, to get usernames for email addresses in mydomain and delete other values, use this expression:

{column_name}.split('@')[0] if re.match('.*@mydomain\..*', {column_name}) else ''

Using RegEx as a condition#

In some custom scripts, you need to define just the condition, not the full Python expression, for example, in select. In this case, enter the following Python expression into the transformation:

re.match('pattern', {column_name})

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to filter.

  • pattern - This is the regex pattern used for filtering.

Replacing data using RegEx#

To perform a substitution based on a regex pattern in a column, use the re.sub() function:

re.sub('pattern', 'output_value', {column_name})

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.

  • pattern - This is the regex pattern used to match the text to be replaced.

  • output_value - This is the value you want to use as the result of the substitution.

For example, to remove all non-alphanumeric characters from text, enter the following Python expression into the transformation:

re.sub(r'[^\w\s]', '', {column_name})

Extracting data using RegEx#

To extract specific parts of a string defined by capture groups in your regex pattern, use re.search() followed by .group(index):

re.search('pattern', {column_name}).group(index)

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.

  • pattern - This is the regex pattern used for extraction.

  • index - This is the numeric index of the regex capture group that you want to extract. If you didn’t use any capture groups in your regex pattern, enter 0.

For example, to extract the first number from a string, enter the following Python expression into the transformation:

re.search(r'(\d+\.?\d*)', {column_name}).group(0)

Before using the extracted number as a numeric value, apply the convertnumbers instruction to the column.