Validate Schema
Validate that a table contains specific columns.
Last updated
Validate that a table contains specific columns.
Last updated
The Validate Schema tool provides a way for workflows to verify that source tables are a particular shape (or schema) before passing the data along to later steps. This can be especially helpful in cases where data is being pulled dynamically from an external source and has the potential to change shape in the future.
The Validate Schema tool accepts two input sources:
Reference Schema: The table that will serve as the starting point for defining how future tables should be shaped.
For Validation: The table that is being compared against the expected schema.
Unlike most Cascade tools, Validate Schema does not include the normal prompt options. Instead, the tool allows users to select or unselect columns related to the prompts below.
Selection
Description
Columns to Confirm
Default is all columns selected. Unselect columns to exclude them from the validation checks in the tool
Allow Nulls
Default is all columns selected. Unselect if columns in the Validation table should not allow for null values.
The Validate Schema tool configuration is a bit different in that it does not contain the usual right navigation bar with prompts to fill out.
In the tool modal, columns from the reference schema source can be selected to be part of the expected schema. By default, every column from the reference schema source that also appears in the source for validation will be set as the default set of columns that should be validated in the future. Columns can also be marked as allowing or disallowing null values.
The most common use case of Validate Schema involves a template table and a table to validate. The template table will have the desired set of columns and formats needed for the downstream cleaning and transformation tools in the workflow canvas.
In the example shown below, the "Transaction Template" table is connected to the Validate Schema tool as the Reference Schema and the "Transactions" table is connected as the table for Validation.
When we initially connect both tables to the Validate Schema tool, you'll see that there are errors because the validation has failed. This will happen when the columns in the table for validation do not match the schema and format of the reference table.
In order to clean up the table for validation, we can add an Edit Columns tool step to reformat and clean up our data. As seen below, once we make the necessary reformatting changes in Edit Columns, we can connect the updated table for validation to the Validate Schema, and the tool passes without any errors.
If the table for validation fails to match the criteria specified in the expected schema, the Validate Schema tool will produce an error in your workflow. Otherwise, the table for validation will be passed along as the output.