Skip to content

Data Quality Profiling Metrics

This page describes and explains the profiling metrics used in the Data Quality & Profile tab of the Explore menu. The focus is on clarity and ensuring that users understand each field that is visible for column profiling.

Table-Level Profiling

The table-level metrics provide an overall snapshot of the table's structure and health:

  • Row Count:
    The total number of rows in the table.
    Usage: Useful for monitoring data volume and detecting unexpected changes in data influx.

  • Column Count:
    The number of columns in the table.
    Usage: Helps verify the expected structure of the table's schema.

  • Profile Creation Date:
    The timestamp indicating when the profile was generated (e.g., "Created Date 7 Mar 2025, 2:30").
    Usage: Indicates the freshness of the profile data which is critical for data quality assessment.


Column-Level Profiling

Each column in the table is assessed using a set of specific metrics. The following metrics are used:

  • Check Name:
    The identifier for the data quality check applied to the column. It specifies the type or category of the check.

  • Data Type:
    The type of data stored in the column (for example, NUMBER).
    Usage: Ensures that the data conforms to expected formats and supports type-specific validations.

  • Null %:
    The percentage of null or missing values within the column.
    Usage: High null percentages might indicate incomplete data or potential issues with data collection.

  • Unique %:
    The percentage of unique values relative to the non-null entries.
    Usage: Provides insight into data diversity; values closer to 100% denote high uniqueness within the data.

  • Distinct %:
    Represents the proportion of distinct values out of the total entries.
    Usage: Useful for identifying redundancy or data variety in the column.

  • Value Count:
    The total count of valid data values in the column.
    Usage: Serves as an indicator of data completeness and is often used in calculating other metrics.

  • # Of Tests:
    Indicates the number of data quality tests executed on the column.
    Usage: Helps track coverage of quality checks applied to the data.

  • Test Status:
    The overall results of the quality tests, typically divided into:

  • Failed Tests: The number of tests the column did not pass.
  • Warning Tests: The number of tests that issued warnings, suggesting potential issues.
  • Passed Tests: The number of tests that the column successfully passed.
    Usage: Provides insight into the overall health of the data. A higher number of failed or warning tests may signal the need for data cleansing or further investigation.

How to Use These Metrics

  • Overall Profiling:
    Start with the table-level metrics to get an overview of your dataset's volume and schema consistency.

  • Drill Down into Columns:
    Use the column-level metrics to identify specific issues or areas for improvement in your data. For example, focus on columns with high null percentages or low distinct values.

  • Quality Testing:
    Pay attention to the test status metrics. Columns with a significant number of failed or warning tests may require further data validation and cleansing efforts.