Data Discovery¶

Dashboard¶

This dashboard provides an overview of the selected data sources. Users can filter data sources and choose a time range of 30, 60, 90 days, or custom.

It displays the following metrics:

Total discovery count
Average processing time
Total schemas
Total tables
Total columns
Sensitive columns
Pending review

Sensitive Data Distribution: This graph visualizes the distribution of sensitive data across the selected data sources. Users can switch between linear and logarithmic scales to better analyze the data.

Classification Status: This pie chart shows the proportion of data classified as Safe, Sensitive, Suspect, or Review, providing a quick overview of the current classification status.

Discoveries¶

This area allows users to monitor and manage all ongoing data discovery jobs across multiple data sources from a single interface. Track progress, review results, and maintain full control over every discovery process in your environment.

Available Actions¶

The following actions can be performed on discovery jobs:

Edit: Modify the selected discovery job.
Delete: Remove the selected discovery job.
Run/Pause: Start or pause the discovery process.
Review: Open detailed scan results, including Scan name, Data source, Database type, Start and end date, and Total duration information. Includes Schemas, Tables, Column Filter, and Reports tabs.
View Log: Display the audit logs for the discovery job. Each log entry includes the date, log details, user, table, and related columns.

Schemas Tab¶

The Schemas tab lists all schemas included in the discovery. For each schema, it shows the number of tables and how many of them were scanned. Clicking a schema name redirects to the Tables tab for further details.

Tables Tab¶

The Tables tab provides detailed information about the discovered tables. For each table, it shows the Status, Schema (the schema the table belongs to), Row count, and Classification status information.

Clicking a table name opens its columns and column details. These details can be exported to Excel for further analysis.

Column Filter Tab¶

It provides an overview of column-level classification results. It displays a classification status pie chart along with lists of non-reviewed columns, reviewed columns, and the columns identified as suspect or sensitive.

Reports Tab¶

The Reports tab provides audit logs of the discovery process. Each log entry includes the date, log details, user, table, and related columns.

Creating a New Discovery¶

To create a new discovery job, click the + New Discovery button and follow these steps:

1. Source Configuration

Select a Data Source.
Enter a Display Name.

2. Rule Configuration

From the Laws & Standards menu, select the required standards and move them to the Selected area using the Add Selected Items button.
From the Classification menu, select the required classifications and move them to the Selected area using the Add Selected Items button.

3. Filter Configuration

Define filters for Schema, Table, and Column using include and exclude patterns.

Include: Add schemas/tables/columns using regex patterns. Separate multiple patterns with commas.

Exclude: Exclude schemas/tables/columns using regex patterns. Separate multiple patterns with commas.

Examples:

.SNOWFLAKE. → contains SNOWFLAKE
^SNOWFLAKE.* → starts with SNOWFLAKE
^SNOWFLAKE$ → exactly SNOWFLAKE

4. Performance & Sampling

Enter minimum/maximum counts.
Set the Threshold (percentage for classification detection, determining whether a classification is suggested as a candidate).
Choose whether to enable the Exclude Null Values option.
Enter values for:
- Number of Threads
- Timeout
Choose whether to enable the Debug Log option.

5. Schedule Inactivity Periods

Define periods when the job should not run for performance optimization.

Click Add for Blackout Period.
Select the day and time range.

After completing all steps, click Save to store the configuration or Deploy and Run to start the discovery immediately.