Monitoring the quality of your data

This guide explains how to use Adverity to monitor and maintain the quality of your data.

This feature is at the Beta stage. It is only available to Adverity customers who are participating in the Beta testing.

Introduction

What are monitors and what can they do?

A data monitor is an automated data quality check that is performed each time you fetch data. With data monitors, you can find anomalies in your data more easily.

What types of monitors are there in Adverity?

There are two types of data monitors in Adverity:

  • Universal monitors

    Universal monitors detect data anomalies and flag potential issues across all data sources. They perform general data quality checks, such as finding duplicate rows in your data.

    Universal monitors raise a warning when detecting an anomaly.

    For more information, see Using universal monitors.

  • Custom monitors

    Custom monitors allow you to define custom rules that your data must meet. Custom monitors can be used to ensure that the data you fetch is valid and matches your specific requirements. You can tell Adverity to raise a datastream issue (error or warning) if a custom monitor detects an anomaly.

    For example, you can use a custom monitor to define the date format you want your data to use and tell Adverity to display a datastream error if the incoming data does not match it.

    For more information, see Using custom monitors.

How do I use monitors?

The way data monitors are used depends on the monitor type.

  • To use universal monitors, enable them on the Administration page of the root workspace. The enabled universal monitors will be applied to all datastreams on your Adverity instance.

  • To use custom monitors, create a custom monitor for a specific datastream. After the monitor is assigned to a datastream, every time you fetch data using the datastream, the collected data is checked to match the rules defined in the monitor.

Using universal monitors

Currently, you can use universal data monitors to detect duplicate rows in your data.

Finding duplicate data in a datastream

Duplicate data in your data extracts can cause inaccurate data analysis. When this option is enabled, Adverity will automatically detect duplicate data and alert you so that you can remove it before any problems occur.

If Adverity detects one or more rows containing identical data in a data extract that you have collected, a warning will appear in the Enrich & Monitor section of the task in the datastream overview, as shown in the image below.

Click this warning to see which rows in the data extract contain identical data. You can then click the name of the data extract to see the entire data extract. If necessary, remove the duplicate data from your data extract before loading the data into Adverity Data Storage or external destinations.

An example of a data extract in which Adverity has detected duplicate data.

Enabling duplicate data checks

To switch duplicate data warnings on or off, follow these steps:

  1. Select the root workspace and then, in the platform navigation menu, click Administration.

  2. Select or deselect the Enable checking for duplicate rows in extracts checkbox.

    This setting is available to users with Administrator permissions in the root workspace of your organization, and will be applied to all child workspaces of the root workspace.

Using custom monitors

To create a custom data quality check, create a custom monitor for a datastream.

Limitations

Custom monitors cannot be created for the fields with the following data types:

  • duration

  • formula

  • json

Creating a new custom monitor

To create a custom monitor, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Datastreams.

  2. Open the chosen datastream by clicking on its name.

  3. In the Monitor section of the datastream overview, click + Add monitor.

  4. Click Create monitor.

  5. In the Monitoring rules section, define the rules that your data must meet using the following fields.

    Use the Data preview section below to review a data extract fetched from your datastream.

    You can create combinations of rules using + Or and +And operators.

    Field

    Select a field from your data extract for which you want to define a rule.

    When editing a custom monitor assigned to multiple datastreams, only the fields available in all of the assigned datastreams are listed in this drop-down.

    Operator

    Select a condition that the values of the selected field must meet.

    To define the rule using a regular expression, select the has structure (regex) option and then enter your regular expression in the Value field.

    Value

    Enter the field's value to define the rule.

  6. Select a datastream issue to be raised if the collected data does not meet the rules defined in the previous step. These datastream issues are displayed in the datastream overview - for more information, see Viewing datastream issues.

    Trigger an error

    Select this option to raise an error and stop processing the data.

    Trigger a warning

    Select this option to raise a warning and continue processing the data.

  7. In the Monitor name section, enter the name for your custom monitor.

  8. Click Apply.

As a result, a custom monitor has been created and assigned to a datastream. Every time you fetch data using the datastream, the collected data will be checked using the rules defined in the monitor.

Managing custom monitors

You can perform the following actions with the custom monitors assigned to the datastream:

Edit an assigned monitor

To edit an assigned monitor, follow these steps:

  1. In the Monitors subsection of the datastream overview, hover over the monitor you want to edit.

  2. Click Edit monitor.

  3. Make the changes to the monitor.

  4. Click Apply.

Unassign a monitor

To unassign a monitor from the datastream, follow these steps:

  1. In the Monitors subsection of the datastream overview, hover over the monitor you want to unassign.

  2. Click Remove monitor.

Datastream warnings and errors

Adverity's datastream issues inform you about potential problems with your data and datastreams. These problems include the following issues:

  • Issues that prevent your datastream from fetching data from your chosen data source, such as incorrect datastream configuration settings

  • Issues that occur while applying an enrichment to your data extract, such as incorrect configuration of an enrichment

  • Issues detected by data monitors, such as duplicate data rows in your data extracts

Viewing datastream issues

There are a number of ways to stay up-to-date with all the datastream warnings and errors in your Adverity workspace. You can find datastream issues in the following ways:

Viewing details of the data quality issues in your data extract

To view the data that does not meet your monitoring rules, follow these steps:

  1. Click Show data extracts in the task overview with the Duplicates detected warning or Custom rules error or warning.

  2. Click the hyperlinked element to view the data extract.

  3. In the selected data quality warning box above the data extract, click Show more information as shown in the image below. Adverity will display the rows violating a specific rule.