Using RegEx in Python expressions#
This guide explains how to use regular expressions to transform data in a data extract.
Introduction#
Use Python expressions in custom scripts to transform your data extract. For a list of all available custom script transformations, see Available custom script instructions.
When using Python expressions in custom scripts, you must follow certain rules - these rules are covered in more detail in Using Python expressions in custom scripts.
To learn more about regular expressions, see the following links:
In Adverity, curly braces {}
are reserved for column name placeholders. If your regular expression uses curly braces, update it to use square brackets instead. For example, use [a-z][2]
instead of [a-z]{2}
.
Using RegEx in Python expressions#
To use a regular expression in a Python expression within a custom script, use the re
Python module.
By default, regular expression matching is case-sensitive. If you want to perform non case-sensitive matching, first convert your column to lower or uppercase in the Python expression. For example:
{column_name}.lower()
{column_name}.upper()
Below are some examples of use cases of using RegEx in Python expressions to transform your data.
Filtering data with RegEx#
To keep only the elements in a column that match a specific regex pattern, use re.match()
inside your conditional Python expression:
'positive_output_value' if re.match('pattern', {column_name}) else 'negative_output_value'
To configure the Python expression, change the following parameters:
column_name
- This is the name of the column that you want to filter.pattern
- This is the regex pattern used for filtering.positive_output_value
andnegative_output_value
- These are target values that you want to use based on data filtering.
For example, to get usernames for email addresses in mydomain
and delete other values, use this expression:
{column_name}.split('@')[0] if re.match('.*@mydomain\..*', {column_name}) else ''
Using RegEx as a condition#
In some custom scripts, you need to define just the condition, not the full Python expression, for example, in select. In this case, enter the following Python expression into the transformation:
re.match('pattern', {column_name})
To configure the Python expression, change the following parameters:
column_name
- This is the name of the column that you want to filter.pattern
- This is the regex pattern used for filtering.
Replacing data using RegEx#
To perform a substitution based on a regex pattern in a column, use the re.sub()
function:
re.sub('pattern', 'output_value', {column_name})
To configure the Python expression, change the following parameters:
column_name
- This is the name of the column that you want to process.pattern
- This is the regex pattern used to match the text to be replaced.output_value
- This is the value you want to use as the result of the substitution.
For example, to remove all non-alphanumeric characters from text, enter the following Python expression into the transformation:
re.sub(r'[^\w\s]', '', {column_name})
Extracting data using RegEx#
To extract specific parts of a string defined by capture groups in your regex pattern, use re.search()
followed by .group(index)
:
re.search('pattern', {column_name}).group(index)
To configure the Python expression, change the following parameters:
column_name
- This is the name of the column that you want to process.pattern
- This is the regex pattern used for extraction.index
- This is the numeric index of the regex capture group that you want to extract. If you didn’t use any capture groups in your regex pattern, enter0
.
For example, to extract the first number from a string, enter the following Python expression into the transformation:
re.search(r'(\d+\.?\d*)', {column_name}).group(0)
Before using the extracted number as a numeric value, apply the convertnumbers instruction to the column.