Monitoring the quality of your data

This guide explains how to use Adverity to monitor and maintain the quality of your data.

This feature is at the Beta stage. It is only available to Adverity customers who are participating in the Beta testing.

Introduction

What are monitors and what can they do?

A data monitor is an automated data quality check that is performed each time you fetch data. With data monitors, you can find anomalies in your data more easily.

For an overview of your data monitors, see (Beta) Introduction to the Data Quality page.

What types of monitors are there in Adverity?

There are two types of data monitors in Adverity:

  • Universal monitors

    Universal monitors detect data anomalies and flag potential issues across all data sources. They perform general data quality checks, such as finding duplicate rows in your data.

    By default, universal monitors raise a warning when an anomaly is detected. You can edit a universal monitor to trigger an error for specific datastreams.

    For more information, see Using universal monitors.

  • Custom monitors

    Custom monitors allow you to define custom rules that your data must meet. Custom monitors can be used to ensure that the data you fetch is valid and matches your specific requirements. You can tell Adverity to raise a datastream issue (error or warning) if a custom monitor detects an anomaly.

    For example, you can use a custom monitor to ensure that the ID column is not empty and tell Adverity to display a datastream error if the incoming data does not match the rule.

    For more information, see Using custom monitors.

How do I use monitors?

The way data monitors are used depends on the monitor type.

  • To use universal monitors, enable them on the Administration page of the root workspace. The enabled universal monitors will be applied to all datastreams on your Adverity instance. Additionally, you can change universal monitors configuration on a datastream level.

  • To use custom monitors, create a custom monitor. After the monitor is assigned to a datastream, every time you fetch data using the datastream, the collected data is checked to match the rules defined in the monitor.

Using universal monitors

Currently, you can use universal data monitors to detect duplicate rows in your data.

Detecting duplicate data in a datastream

Duplicate data in your data extracts can cause inaccurate data analysis. When this option is enabled, Adverity will automatically detect duplicate data and alert you so that you can remove it before any problems occur.

If Adverity detects one or more rows containing identical data in a data extract that you have collected, a Data Quality warning will appear in the Enrich & Monitor section of the task in the datastream overview.

To check the details about the detected duplicates, see Viewing details of the data quality issues in your data extract. If necessary, remove the duplicate data from your data extract before loading the data into Adverity Data Storage or external destinations.

Enabling duplicate data checks

To switch duplicate data warnings on or off for all datastreams, follow these steps:

  1. Select the root workspace and then, in the platform navigation menu, click Administration.

  2. Select or deselect the Enable checking for duplicate rows in extracts checkbox.

    This setting is available to users with Administrator permissions in the root workspace of your organization, and will be applied to all child workspaces of the root workspace.

Managing the data uniqueness monitor for a specific datastream

You can perform the following actions with the data uniqueness monitor on a datastream level:

Edit uniqueness monitor

By default, the uniqueness monitor triggers a warning when duplicates are detected. To change this setting for a specific datastream so that it triggers an error instead, follow these steps:

  1. In the Monitors subsection of the datastream overview, find the uniqueness monitor.

  2. Click Edit uniqueness monitor.

  3. Select the issue type that should be triggered if the monitor detects duplicates from the following options:

    Trigger an error

    Select this option to raise an error and stop processing the data.

    Trigger a warning

    Select this option to raise a warning and continue processing the data.

  4. Click Apply.

Deactivate uniqueness monitor

To deactivate the uniqueness monitor for a datastream, follow these steps:

  1. In the Monitors subsection of the datastream overview, find the uniqueness monitor.

  2. Disable the toggle next to the uniqueness monitor.

Using custom monitors

To create a custom data quality check, create a custom monitor. Custom monitors can be assigned to multiple datastreams.

Limitations

Custom monitors cannot be created for the fields with the following data types:

  • duration

  • formula

  • json

Creating a new custom monitor

To create a custom monitor, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Datastreams.

  2. Open the chosen datastream by clicking on its name.

  3. In the Monitor section of the datastream overview, click + Add monitor.

  4. Click Create monitor.

  5. In the Monitoring rules section, define the rules that your data must meet using the following fields.

    Use the Data preview section below to review a data extract fetched from your datastream.

    You can create combinations of rules using + Or and +And operators.

    Field

    Select a field from your data extract for which you want to define a rule.

    When editing a custom monitor assigned to multiple datastreams, only the fields available in all of the assigned datastreams are listed in this drop-down.

    Operator

    Select a condition that the values of the selected field must meet.

    To define the rule using a regular expression, select the has structure (regex) option and then enter your regular expression in the Value field.

    Value

    Enter the field's value to define the rule.

  6. Select a datastream issue to be raised if the collected data does not meet the rules defined in the previous step. These datastream issues are displayed in the datastream overview - for more information, see Viewing datastream issues.

    Trigger an error

    Select this option to raise an error and stop processing the data.

    Trigger a warning

    Select this option to raise a warning and continue processing the data.

  7. In the Monitor name section, enter the name for your custom monitor.

  8. Click Apply.

As a result, a custom monitor has been created and assigned to a datastream. Every time you fetch data using the datastream, the collected data will be checked using the rules defined in the monitor.

Managing assigned custom monitors

You can perform the following actions with the custom monitors assigned to the datastream:

Edit an assigned monitor

To edit an assigned monitor, follow these steps:

  1. In the Monitors subsection of the datastream overview, hover over the monitor you want to edit.

  2. Click Edit monitor.

  3. Make the changes to the monitor.

  4. Click Apply.

To edit the list of datastreams to which the custom monitor is assigned, open the editing view from the Data Quality page. For more information, see Editing a custom monitor.

Unassign a monitor

To unassign a monitor from a datastream, follow these steps:

  1. In the Monitors subsection of the datastream overview, hover over the monitor you want to unassign.

  2. Click Remove monitor.

Deleting a custom monitor

To delete a custom monitor, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Data Quality.

  2. Find the custom monitor which you want to delete.

  3. Click Select an action in the monitor's row.

  4. Click Delete monitor.

  5. In the confirmation dialog, click Delete monitor.

Datastream warnings and errors

Adverity's datastream issues inform you about potential problems with your data and datastreams. These problems include the following issues:

  • Issues that prevent your datastream from fetching data from your chosen data source, such as incorrect datastream configuration settings

  • Issues that occur while applying an enrichment to your data extract, such as incorrect configuration of an enrichment

  • Issues detected by data monitors, such as duplicate data rows in your data extracts

Viewing datastream issues

There are a number of ways to stay up-to-date with all the datastream warnings and errors in your Adverity workspace. You can find datastream issues in the following ways:

Viewing details of the data quality issues in your data extract

To view the data that does not meet your monitoring rules, follow these steps:

  1. Click the task overview for the task with the Data Quality error or warning.

  2. Above the data extract, you will see Data Quality errors and warnings for specific monitors.

    In the selected data quality issue box above the data extract, click View details. In the details window, you can see the following information:

    Duplicate data checks

    Adverity displays the duplicate rows found in your data extract and the number of times they occur.

    You can download the duplicate rows at the bottom of the window.

    Custom monitors

    Adverity displays the rows in your data extract that violate each of the defined rules and the number of times the violation occurs.