Using Python expressions to process text columns#
This guide explains how to transform text columns in a data extract.
Introduction#
Use Python expressions in custom scripts to transform your data extract. For a list of all available custom script transformations, see Available custom script instructions.
When using Python expressions in custom scripts, you must follow certain rules - these rules are covered in more detail in Using Python expressions in custom scripts.
Managing capitalization#
To manage capitalization of your text columns, use the following converters with convert or convertall custom scripts:
- lower
All characters are converted to lowercase.
- upper
All characters are converted to uppercase.
- capitalize
Only the first character of the whole string is capitalized and the rest is lowercase.
- title
The first character of each word in the string is capitalized and the rest is lowercase.
- swapcase
All uppercase characters are converted to lowercase and vice versa.
For more information, see the Python documentation.
Replacing characters#
To replace a single character or a substring, enter the following Python expression into the transformation:
re.sub('old_value', 'new_value', {column_name})
To configure the Python expression, change the following parameters:
column_name- This is the name of the column that you want to process.old_value- This is the value that you want to replace. You can use a regular expression to match the string to be replaced. For more information, see Using RegEx in Python expressions.new_value- This is the target value of the substitution.
Trimming spaces#
To trim leading and trailing spaces, use {column_name}.strip().
To replace multiple spaces within a string with a single space, enter the following Python expression into the transformation:
' '.join(({column_name}).split())
To configure the Python expression, change the following parameters:
column_name- This is the name of the column that you want to process.
Removing new lines#
To remove the new lines from a text column, use the convertx instruction with the following Python expression:
' '.join({column_name}.splitlines())
To configure the Python expression, change the following parameters:
column_name- This is the name of the column that you want to process.
Calculating word count#
To calculate the word count for a string, enter the following Python expression into the transformation:
len({column_name}.split())
To configure the Python expression, change the following parameters:
column_name- This is the name of the column that you want to process.