Using Python expressions to process text columns#

This guide explains how to transform text columns in a data extract.

Introduction#

Use Python expressions in custom scripts to transform your data extract. For a list of all available custom script transformations, see Available custom script instructions.

When using Python expressions in custom scripts, you must follow certain rules - these rules are covered in more detail in Using Python expressions in custom scripts.

Managing capitalization#

To manage capitalization of your text columns, use the following converters with convert or convertall custom scripts:

lower

All characters are converted to lowercase.

upper

All characters are converted to uppercase.

capitalize

Only the first character of the whole string is capitalized and the rest is lowercase.

title

The first character of each word in the string is capitalized and the rest is lowercase.

swapcase

All uppercase characters are converted to lowercase and vice versa.

For more information, see the Python documentation.

Replacing characters#

To replace a single character or a substring, enter the following Python expression into the transformation:

re.sub('old_value', 'new_value', {column_name})

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.

  • old_value - This is the value that you want to replace. You can use a regular expression to match the string to be replaced. For more information, see Using RegEx in Python expressions.

  • new_value - This is the target value of the substitution.

Trimming spaces#

To trim leading and trailing spaces, use {column_name}.strip().

To replace multiple spaces within a string with a single space, enter the following Python expression into the transformation:

' '.join(({column_name}).split())

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.

Removing new lines#

To remove the new lines from a text column, use the convertx instruction with the following Python expression:

' '.join({column_name}.splitlines())

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.

Calculating word count#

To calculate the word count for a string, enter the following Python expression into the transformation:

len({column_name}.split())

To configure the Python expression, change the following parameters:

  • column_name - This is the name of the column that you want to process.