Collecting data from File

This guide explains how to collect data from File. To learn how to collect data from a different data source, go back to the Available data sources in Adverity overview.

Use the File connector to collect data from files located on a file server.

Limitations

Collecting data from File comes with the following limitations:

  • When manually uploading a file, the file size limit is 20 MB. If the file you want to upload is larger than 20 MB, we recommend compressing your file into a zip folder using a high compression ratio and uploading the zip folder.

Creating a datastream to collect data from File

The basics of creating a datastream to collect data from any data source are explained in our guide to Creating a datastream. This guide contains information about the specific steps to create a datastream to fetch data from File.

Configuration: Choose the data you want to collect from File

The File connector has many optional fields to configure. This section covers only the mandatory fields to configure for the connector. For a complete description of the optional fields to configure, see Advanced File tips.

To choose what data to collect and customize the File datastream configuration, follow these steps:

  1. (Optional) Rename your datastream.

  1. In Source URL, enter the URL of the file server and the sub-folder where the data files are found.

  2. Click Test to check the authorization.

  3. In File pattern, enter the name of the file to collect data from. To collect data from more than one file, specify a regular expression that matches the filenames. For more information on how to configure the File pattern field, see Advanced File tips.

For information on configuring other File fields, see Advanced File tips.

What's next?

Troubleshooting File

I see an error message when fetching data from File

  • An error message beginning with 'utf-8' codec can't decode or 'ascii' codec can't decode can appear for a number of reasons:

    • The encoding for your File datastream is defined incorrectly.

    • The datastream is set to parse an incorrect file type.

    • The source file contains characters with multiple encoding types.

    • The source file contains invalid or corrupted characters.

    To resolve this issue, depending on the cause, you need to correct your datastream settings, ensure that the source file only contains characters with one encoding type, and remove any invalid or corrupted characters in the source file.

    If this error persists after following these steps, please contact the Adverity Support team.

  • The Adverity cannot determine the character encoding error message appears when Adverity is not sure whether the automatically detected encoding is correct.

    To resolve this issue, we recommend checking the automatically detected encoding, and changing the setting if it is incorrect. If it is correct, you do not need to take any further action.

Advanced File tips

Configuring data collection from a file server

In the Settings tab of your datastream overview, you can configure a number of additional settings:

Warnings enabled

Select this checkbox to display a warning in the datastream Overview page when no file is found on the file server or the file server URL is invalid.

The File Matching Options section contains the following fields:

File pattern

Specify a regular expression that matches the file attachment from which to collect data. By default, File pattern is pre-populated with .* which collects data for any attached file.

Other examples of regular expressions for file patterns include:

  • .*\.xlsx collects every file that ends with .xlsx.

  • .*\.csv collects every file that ends with .csv.

  • .*Display.*\.csv collects every file that contains the (case-sensitive) phrase Display anywhere in the file name and also ends with .csv.

Zip match

If you collect data from ZIP or GZIP files, specify a regular expression that matches the file names within the ZIP or GZIP containers. Leave this field empty to collect all files within the ZIP or GZIP container.

Archive password

If the ZIP or GZIP files require password access, provide the password. By default, Adverity uses the previously entered password, if available.

Filename date match, Filename date pattern

Use these fields to fetch source files with a date in their filename that is within the fetch date range.

To use these fields the following conditions need to be satisfied:

  • The File pattern field includes the date placeholders for the date you want to match. For example, .*\_%Y-%m-%d\_.*\.csv.

  • In the Sortorder field, the Date match option is selected.

To match dates in the filenames, follow these steps:

  1. In Filename date match, enter a regular expression that matches the date. For example, \d{4}-\d{2}-\d{2} or \d{8}.

  2. In Filename date pattern, enter the date format used in the filenames using placeholders. For example, %Y-%m-%d or %m%d%Y.

For more information, see Fetching files using the File connector based on dates in the filenames.

Keep filename

Select this checkbox to name data extracts using the same name as the corresponding source files. If you select this checkbox and the source filename stays the same between fetches, data in the corresponding data extract is overwritten.

The File Parsing section contains the following fields:

Parse

Select the data format used in the source files. The type of data format selected may cause additional fields to appear.

Source encoding

Select the character encoding used in the source files. By default, the option Auto-detect is selected and will automatically detect the encoding of the source files.

Delimiter

If CSV is selected in Parse, specify the character used to separate values in the CSV source files.

Quote Char

If CSV is selected in Parse, specify the character used to quote values with special characters in the CSV source files.

Quoting

If CSV is selected in Parse, specify when a quoting character should be added to the field values in the data extract. Choose from one of the following options:

  • Select all to add quote characters to everything in the data extract, regardless of the field type.

  • (Default) Select minimal to add quote characters only when required. For example, a quote character will be added to a field that contains either the Quote Char or Delimiter.

  • Select none to ensure no quote characters are added to the data extract.

  • Select nonnumeric to add quote characters to everything, except integer and float values.

Sheet

If Excel is selected in Parse, specify the name of the sheet within the Excel source files to import to Adverity.

Column offset

If Excel is selected in Parse, specify the number of columns that you do not want to import to Adverity from each Excel source file. For example, if the first column contains information that you do not want to import, specify 1 in this field.

Row offset

Specify the number of rows that you do not want to import to Adverity from each source file. For example, if the first row contains header information that you do not want to import, specify 1 in this field. This field is available for all parse types.

Skip initial space

If CSV is selected in Parse, select this checkbox to ignore any whitespace that follows the selected delimiter character.

The File Processing section contains the following fields:

Process all

By default, Adverity only processes the most recently uploaded files. Select this checkbox to process all files that match the criteria you specify.

Recursive

By default, Adverity only searches for source files in the folder that you specify. Select this checkbox to search in all the subfolders of the specified folder.

Concatenate files

Select this checkbox to combine data from all source files into a single data extract.

The File connector concatenates the full content of the files. Use this option to concatenate files without headers, such as log files. To correctly combine files with headers, use the Bundle datastream. For more information, see Combining data extracts using Bundle.

Delete source

Select this checkbox to delete the source files after Adverity has imported their data.

Move to

Specify the full path to the folder in the file server into which the source files will be moved after Adverity has imported the data.

Move to hierarchy

Select this checkbox to move the source files to the folder specified in the Move to field after Adverity has imported their data, and to use a folder structure that mirrors the original folder structure. This option is only effective if you also select the Recursive checkbox and specify a folder in the Move to field.

Sortorder

Specify the order in which Adverity processes the source files. Select one of the following:

  • Select Filename to process source files in alphabetical order based on their filenames.

  • Select Modification Time to process source files in chronological order based on the file's last modification time.

  • Select Date Match to process source files in chronological order based on the date contained in their filenames. To specify how the date is contained in the filenames, see the fields Filename date match and Filename date pattern.

Reverse sortorder

Select this checkbox to reverse the order in which Adverity processes the source files that you specify in the Sortorder field.

Ignore file time

By default, Adverity uses the file timestamp to import data. Select this checkbox to use the date contained in the filenames that you specify with the fields Filename date match and Filename date pattern.

Fetching files using the File connector based on dates in the filenames

To fetch data from File for a specific date range based on the filenames, configure the datastream in the following way:

  1. Before you complete the procedure, make sure the names of the source files contain dates. For example, filename-2021-04-16.

  2. Go to the Datastreams page.

  3. Open the File datastream by clicking on its name.

  4. In the top navigation panel, click Settings.

  5. In File pattern, enter a regular expression that matches the filenames, including the date pattern that you want to use for matching the fetch date.

    Use the following placeholders to specify the date contained in the filenames:

    Placeholder

    Description

    %Y

    year

    %m

    month

    %d

    day

    %H

    hour

    %M

    minute

    %S

    second

    For example, use the expression adverity-hourly-%Y-%m-%d-%H-%M to match files with the format adverity-hourly-2021-04-16-15-40.

  6. In Filename Date Match, enter a regular expression that matches the date you want to use. For example, \d{4}-\d{2}-\d{2}-\d{2}-\d{2}.

  7. In Filename Date Pattern, enter the date format used in the filename. For example, %Y-%m-%d-%H-%M.

  8. Select the Process all checkbox in the File Processing section.

  9. In Sortorder, select Date match.

  10. Select the Ignore file time checkbox.

  11. Fetch your File files specifying the date range. For more information, see Manual and scheduled fetches.

Uploading files into Adverity

To upload a file into Adverity using the File datastream, follow these steps:

  1. Go to the Datastreams page.

  2. Open the File datastream by clicking on its name.

  3. In the top right corner of the page, click Upload.

  4. Click Choose File, and select the file to upload.

  5. (Optional) Select the Keep data in raw state checkbox to achieve the following goals:

    • Keep the data in its original form and do not apply any enrichments assigned to this datastream. For more information on enriching your data, see Enriching data in Adverity.

    • Fetch the data without loading it into the destination specified for this datastream. For more information on loading data into a destination, see Loading data into destinations.

  6. Click Upload.

As a result, the file is uploaded into Adverity and Adverity creates a data extract that includes the uploaded data.

Fetching files from File manually

To browse files in the File server and fetch them manually, follow these steps:

  1. Go to the Datastreams page.

  2. Open the File datastream by clicking on its name.

  3. In the top right corner of the page, click More .

  4. Click Browser.

  5. Select the files to fetch. If the file is in a folder, click on the folder name to open the folder and view the files.

  6. Click Fetch selected files.

The fetch collects data from File which takes some time. The Overview page of the newly created datastream is now displayed. To preview the collected data, follow these steps:

  1. In the All tasks tab, find the task at the top of the list, and click Show extracts.

  2. Click the top hyperlink.

  3. The data extract is displayed in a table containing the data that you have fetched.