Skip to content

Data Discovery

Dashboard

This dashboard provides an overview of the selected data sources. Users can filter data sources and choose a time range of 30, 60, 90 days, or custom.

It displays the following metrics:

  • Total discovery count

  • Average processing time

  • Total schemas

  • Total tables

  • Total columns

  • Sensitive columns

  • Pending review

Sensitive Data Distribution: This graph visualizes the distribution of sensitive data across the selected data sources. Users can switch between linear and logarithmic scales to better analyze the data.

Classification Status: This pie chart shows the proportion of data classified as Safe, Sensitive, Suspect, or Review, providing a quick overview of the current classification status.

Discoveries

This area allows users to monitor and manage all ongoing data discovery jobs across multiple data sources from a single interface. Track progress, review results, and maintain full control over every discovery process in your environment.

Available Actions

The following actions can be performed on discovery jobs:

  • Edit: Modify the selected discovery job.

  • Delete: Remove the selected discovery job.

  • Run/Pause: Start or pause the discovery process.

  • Review: Open detailed scan results, including Scan name, Data source, Database type, Start and end date, and Total duration information. Includes Schemas, Tables, Column Filter, and Reports tabs.

  • View Log: Display the audit logs for the discovery job. Each log entry includes the date, log details, user, table, and related columns.

Schemas Tab

The Schemas tab lists all schemas included in the discovery. For each schema, it shows the number of tables and how many of them were scanned. Clicking a schema name redirects to the Tables tab for further details.

Tables Tab

The Tables tab provides detailed information about the discovered tables. For each table, it shows the Status, Schema (the schema the table belongs to), Row count, and Classification status information.

Clicking a table name opens its columns and column details. These details can be exported to Excel for further analysis.

Column Filter Tab

It provides an overview of column-level classification results. It displays a classification status pie chart along with lists of non-reviewed columns, reviewed columns, and the columns identified as suspect or sensitive.

Reports Tab

The Reports tab provides audit logs of the discovery process. Each log entry includes the date, log details, user, table, and related columns.

Creating a New Discovery

To create a new discovery job, click the + New Discovery button and follow these steps:

1. Source Configuration

  • Select a Data Source.

  • Enter a Display Name.

2. Rule Configuration

  • From the Laws & Standards menu, select the required standards and move them to the Selected area using the Add Selected Items button.

  • From the Classification menu, select the required classifications and move them to the Selected area using the Add Selected Items button.

3. Filter Configuration

Define filters for Schema, Table, and Column using include and exclude patterns.

Include: Add schemas/tables/columns using regex patterns. Separate multiple patterns with commas.

Exclude: Exclude schemas/tables/columns using regex patterns. Separate multiple patterns with commas.

Examples:

  • .SNOWFLAKE. → contains SNOWFLAKE
  • ^SNOWFLAKE.* → starts with SNOWFLAKE
  • ^SNOWFLAKE$ → exactly SNOWFLAKE

4. Performance & Sampling

  • Enter minimum/maximum counts.
  • Set the Threshold (percentage for classification detection, determining whether a classification is suggested as a candidate).
  • Choose whether to enable the Exclude Null Values option.
  • Enter values for:
    • Number of Threads
    • Timeout
  • Choose whether to enable the Debug Log option.

5. Schedule Inactivity Periods

Define periods when the job should not run for performance optimization.

  • Click Add for Blackout Period.
  • Select the day and time range.

After completing all steps, click Save to store the configuration or Deploy and Run to start the discovery immediately.