Skip to content

Data Quality Test Cases

Data Quality is designed to help you ensure the integrity, accuracy, completeness, consistency, and validity of your data. This page explains the various test cases available, their key parameters, and the conditions under which they succeed or fail.

TABLE TESTS

1. Table Column Names To Match Set

  • Dimension: Integrity
  • Description:
    Verifies that the column names in your table match the expected set of values. By default, ordering is not enforced.
  • Key Parameters:
    Column Names – A comma-separated string of expected column names.
    Ordered – (Optional) A boolean to enforce the order.
  • Success: When the actual table columns match the expected set (and order if required).
  • Failure: Any discrepancy in the set or order results in failure.

2. Table Column Count To Equal

  • Dimension: Integrity
  • Description:
    Checks that the number of columns in a table exactly equals the specified value.
  • Key Parameter:
    Count – The expected number of columns.
  • Success: When the table contains the exact number of columns.
  • Failure: A mismatch causes the test to fail.

3. Table Row Count To Equal

  • Dimension: Integrity
  • Description:
    Validates that the total number of rows in a table equals the expected count.
  • Key Parameter:
    Count – The expected number of rows.
  • Success: Correct row count results in success.
  • Failure: Any difference leads to failure.

4. Table Column Name To Exist

  • Dimension: Integrity
  • Description:
    Ensures that a specific column exists within the table.
  • Key Parameter:
    Column Name – The expected column name.
  • Success: The column is found.
  • Failure: The column is missing, causing the test to fail.

5. Custom SQL Query Test

  • Dimension: SQL
  • Description:
    Runs a custom SQL expression. The test succeeds if the results meet the threshold criteria (e.g., returns 0 rows).
  • Key Parameters:
    SQL Expression – The query to be executed.
    Strategy – Determines if the query is executed as returning rows (ROWS) or a count (COUNT).
    Threshold – The maximum allowed result value (default is 0).
  • Success: The query results are within allowed limits.
  • Failure: Exceeding the threshold leads to failure.

6. Table Row Count To Be Between

  • Dimension: Integrity
  • Description:
    Checks if the table row count falls between specified minimum and maximum values.
  • Key Parameters:
    Min Value – Lower bound (optional).
    Max Value – Upper bound (optional).
  • Success: When the row count is within range.
  • Failure: Outside the bounds results in failure.

7. Table Row Inserted Count To Be Between

  • Dimension: Integrity
  • Description:
    Validates that the number of rows inserted during a specified period is within the expected range.
  • Key Parameters:
    Min Row Count – Lower bound of inserted rows.
    Max Row Count – Upper bound of inserted rows.
    Column Name – Timestamp column used for filtering.
    Range Type – e.g., HOUR, DAY, MONTH, YEAR.
    Range Interval – The interval to check.
  • Success: When inserted count lies within range.
  • Failure: Outside the allowed range causes failure.

8. Table Column Count To Be Between

  • Dimension: Integrity
  • Description:
    Validates that the number of columns is between a defined minimum and maximum.
  • Key Parameters:
    Min Column Value – Lower bound.
    Max Column Value – Upper bound.
  • Success: Column count within the range results in success.
  • Failure: Outside the range causes failure.

COLUMN TESTS

1. Column Value Median To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Ensures the median of the column values lies within specified boundaries.
  • Key Parameters:
    Min Value For Median – Lower bound (optional).
    Max Value For Median – Upper bound (optional).
  • Success: Calculated median is within range.
  • Failure: If outside the range.

2. Column Value Min To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Validates the column’s minimum value is within the defined range.
  • Key Parameters:
    Min Value For Min – Lower bound (optional).
    Max Value For Min – Upper bound (optional).
  • Success: Minimum value within range.
  • Failure: Out of range leads to failure.

3. Column Values To Match Regex

  • Dimension: Validity
  • Supported Data Types: [ "BYTES", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR" ]
  • Description:
    Checks that all values match the specified regular expression pattern.
  • Key Parameter:
    RegEx Pattern – The pattern each value should satisfy.
  • Success: Every value matches the pattern.
  • Failure: Any mismatch results in failure.

4. Column Values To Not Match Regex

  • Dimension: Validity
  • Supported Data Types: [ "BYTES", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR" ]
  • Description:
    Checks that none of the values match the specified regular expression pattern.
  • Key Parameter:
    RegEx Pattern – The pattern that values should not match.
  • Success: When no value in the column matches the pattern.
  • Failure: If any value in the column matches the pattern.

5. Column Values To Be Not Null

  • Dimension: Completeness
  • Supported Data Types: [ "NUMBER", "TINYINT", "SMALLINT", "INT", "BIGINT", "BYTEINT", "BYTES", "FLOAT", "DOUBLE", "DECIMAL", "NUMERIC", "TIMESTAMP", "TIMESTAMPZ", "TIME", "DATE", "DATETIME", "INTERVAL", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR", "BOOLEAN", "BINARY", "VARBINARY", "ARRAY", "BLOB", "LONGBLOB", "MEDIUMBLOB", "MAP", "STRUCT", "UNION", "SET", "GEOGRAPHY", "ENUM", "JSON", "UUID", "VARIANT", "GEOMETRY", "POINT", "POLYGON" ]
  • Description:
    Ensures that no values in the column are NULL.
  • Success: All values are non-null.
  • Failure: Presence of null values causes failure.

6. Column Values To Be In Set

  • Dimension: Validity
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT", "BYTES", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR" ]
  • Description:
    Validates that every column value belongs to a predefined set of allowed values.
  • Key Parameters:
    Allowed Values – A list of permitted values.
    Match Enum – (Optional) Validates each value independently.
  • When true, every column value must exactly match one of the allowed values.
  • When false, the test only requires that at least one column value matches one of the allowed values.
  • Success: Every value is within the allowed set.
  • Failure: Any value outside the set results in failure.

If matchEnum is true: The test iterates through all column values.

  • Success: Only if every value is contained in the Allowed Values list ("ALL MATCHED").
  • Failure: If any column value is not in the Allowed Values list ("NOT ALL MATCHED").

If matchEnum is false: The test scans for the presence of any value that is in the Allowed Values list.

  • Success: If at least one column value matches one of the allowed values ("SOME MATCHED").
  • Failure: If none of the column values match any of the Allowed Values ("NO MATCH").

7. Column Values To Be Not In Set

  • Dimension: Validity
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT", "BYTES", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR" ]
  • Description:
    Ensures that none of the column values are among a forbidden set.
  • Key Parameter:
    Forbidden Values – A list of disallowed values.
  • Success: No forbidden value is present.
  • Failure: Presence of any forbidden value causes failure.

8. Column Values To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT", "TIMESTAMP", "TIMESTAMPZ", "DATETIME", "DATE" ]
  • Description:
    Ensures that each value in the column lies within a specified minimum and maximum.
  • Key Parameters:
    Min Value – Lower bound (optional).
    Max Value – Upper bound (optional).
  • Success: All values are within the defined range.
  • Failure: Any value outside the range results in failure.

9. Column Value Std Dev To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Validates the standard deviation of the column values is within set limits.
  • Key Parameters:
    Min Std Dev – Expected lower bound.
    Max Std Dev – Expected upper bound.
  • Success: The standard deviation is within bounds.
  • Failure: If it falls outside the limits.

10. Column Value Mean To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Checks that the average (mean) of the column values is within a specified range.
  • Key Parameters:
    Min Mean – Expected lower bound.
    Max Mean – Expected upper bound.
  • Success: Mean is within the range.
  • Failure: Otherwise, the test fails.

11. Column Value Lengths To Be Between

  • Dimension: Accuracy
  • Supported Data Types: • BYTES • STRING • MEDIUMTEXT • TEXT • CHAR • VARCHAR • ARRAY
  • Description:
    Validates that the length of each column value lies between minimum and maximum thresholds.
  • Key Parameters:
    Min Length – Minimum allowed length.
    Max Length – Maximum allowed length.
  • Success: All values’ lengths are within range.
  • Failure: Any value with a length outside the specified limits causes failure.

12. Column Value Max To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Ensures the maximum value present in the column is within specified limits.
  • Key Parameters:
    Min Value For Max – Lower bound.
    Max Value For Max – Upper bound.
  • Success: Maximum value lies within the expected range.
  • Failure: If it is below or above the permitted values.

13. Column Values To Be Unique

  • Dimension: Uniqueness
  • Supported Data Types: [ "NUMBER", "TINYINT", "SMALLINT", "INT", "BIGINT", "BYTEINT", "BYTES", "FLOAT", "DOUBLE", "DECIMAL", "NUMERIC", "TIMESTAMP", "TIMESTAMPZ", "TIME", "DATE", "DATETIME", "INTERVAL", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR", "BOOLEAN", "BINARY", "VARBINARY", "BLOB", "LONGBLOB", "MEDIUMBLOB", "MAP", "STRUCT", "UNION", "SET", "GEOGRAPHY", "ENUM", "UUID", "VARIANT", "GEOMETRY", "POINT", "POLYGON" ]
  • Description:
    Checks that all values in the column are unique with no duplicates.
  • Success: When every value is distinct.
  • Failure: Duplicate values result in failure.

14. Column Values Missing Count

  • Dimension: Completeness
  • Supported Data Types: [ "NUMBER", "TINYINT", "SMALLINT", "INT", "BIGINT", "BYTEINT", "BYTES", "FLOAT", "DOUBLE", "DECIMAL", "NUMERIC", "TIMESTAMP", "TIMESTAMPZ", "TIME", "DATE", "DATETIME", "INTERVAL", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "VARCHAR", "BOOLEAN", "BINARY", "VARBINARY", "ARRAY", "BLOB", "LONGBLOB", "MEDIUMBLOB", "MAP", "STRUCT", "UNION", "SET", "GEOGRAPHY", "ENUM", "JSON", "UUID", "VARIANT", "GEOMETRY", "POINT", "POLYGON" ]
  • Description:
    Validates that the total count of missing values (NULL or specified missing strings) matches the expected count.
  • Key Parameters:
    Missing Count Value – Expected total missing count.
    Missing Value Match – (Optional) Additional missing value strings.
  • Success: The missing count equals the expected value.
  • Failure: A discrepancy leads to test failure.

15. Column Values Sum To Be Between

  • Dimension: Accuracy
  • Supported Data Types: [ "NUMBER", "INT", "FLOAT", "DOUBLE", "DECIMAL", "TINYINT", "SMALLINT", "BIGINT", "BYTEINT" ]
  • Description:
    Checks that the sum of the column's values falls within a specified range.
  • Key Parameters:
    Min Sum – Lower bound for the column sum.
    Max Sum – Upper bound for the column sum.
  • Success: Sum is within the range.
  • Failure: If the sum falls outside the permissible limits.

Final Notes

  • Each test in Data Quality is designed to clearly indicate a pass/success when data meets the expected criteria and a failure when it does not.
  • Utilize these test cases to continuously monitor and ensure your data’s health throughout its lifecycle.