Cascade Library
  • Introduction
    • Welcome to Cascade
    • Workspaces
  • Getting Started
    • Build Your First Workflow
    • Build Your First Data App
  • Workflows
    • Overview
      • Navigating the Canvas
      • Setting Up
      • Drag and Drop
      • Tools
    • Dynamic Workflows
    • Global Variables
    • Data Locker
    • Scheduling
    • Webhooks
    • Run Logs
    • Workflow Deployment
  • Integrations
    • Connecting Cascade to your database
    • Amazon S3
    • Azure Blob Storage
    • BigQuery
    • Google Sheets
    • MySQL
    • Postgres
    • Redshift
    • SQL Server
    • Snowflake
    • Tableau Server
  • Tools
    • Import
      • Import File
      • Import from Data Locker
      • Import from API
      • Import Sample Data
      • New Table
    • Clean
      • Validate Schema
      • Find/Replace
      • Text to Columns
      • Flatten Json
      • Sample
      • Standardize
      • Validate
    • Transform
      • Edit Columns
      • Select Columns
      • Filter
      • Sort
      • Pivot
      • Unpivot
      • Deduplicate
    • Merge
      • Append
      • Join
      • Multi Join
      • Fuzzy Join
    • Predictive Modeling
      • Build Model
      • Apply Model
      • Correlate
      • ARIMA Forecast
    • Flow
      • Conditional
    • Code
      • Python
      • SQL
    • Visualize
      • Chart
        • Bar
        • Line
        • Combo
        • Scatter
        • Histogram
        • Box
        • Pie
        • Area
        • Funnel
    • Publish
      • Publish to Data Locker
      • Publish via Email
      • Publish to URL
      • Embed
  • Functions & Expressions
    • Functions
      • Aggregate Functions
        • AVERAGE
        • CORR
        • COUNT
        • COUNTD
        • COUNTBY
        • COUNTIF
        • COUNTIFS
        • COVAR
        • COVARP
        • COVARS
        • MAX
        • MEDIAN
        • MIN
        • MAXBY
        • MINBY
        • PERCENTILE
        • STDEV
        • STDEVP
        • STDEVS
        • SUM
        • SUMBY
        • SUMIF
        • VAR
        • VARP
        • VARS
        • RUNNINGTOTALBY
          • SIGN
          • SIN
          • SQRT
          • SQUARE
          • TAN
          • ZN
      • Conversion Functions
        • TIMESTAMPTODATE
        • TODATE
        • TODECIMAL
        • TOINT
      • Date/Time Functions
        • DATEADD
        • DATEDIF
        • DATENAME
        • DATENORMALIZE
        • DATEPART
        • DATETRUNC
        • DAY
        • DAYS
        • HOUR
        • ISDATE
        • ISOWEEKDAY
        • ISOWEEK
        • ISOQUARTER
        • ISOYEAR
        • MAKEDATE
        • MAKEDATETIME
        • MINUTE
        • MONTH
        • NOW
        • QUARTER
        • SECOND
        • TODAY
        • WEEK
        • WEEKDAY
        • YEAR
      • Logical Functions
        • AND
        • BETWEEN
        • CASE
        • CHOOSE
        • CONTAINSWITHIN
        • IF
        • IFS
        • IIF
        • IN
        • IFNULL
        • ISBOOLEAN
        • ISDECIMAL
        • ISDURATION
        • ISINTEGER
        • ISNULL
        • ISNUMBER
        • ISSTRING
        • ISUNIQUE
        • NOT
        • NULL
        • OR
        • SWITCH
        • ALL
          • TOSTRING
        • ANY
      • Math Functions
        • ABS
        • ACOS
        • ASIN
        • ATAN
        • ATAN2
        • CEILING
        • COS
        • COT
        • COSEC
        • DEGREES
        • DIV
        • EVEN
        • EXPONENTIAL
        • FILLINFINITY
        • FLOOR
        • HAVERSINE
        • LOG
        • LN
        • ODD
        • MODULO
        • PERCENTILEOFVALUE
        • PERCENTILEVALUE
        • PI
        • POWER
        • RADIANS
        • RANDOM
        • ROUND
        • SEC
      • Table Functions
        • ENCODE
        • INDEX
        • INDEXBY
        • FILLNULL
        • FIRSTBY
        • GENERATEUNIQUEID
        • LASTBY
        • LOOKUP
        • MATCH
        • NTH
        • OFFSET
        • OFFSETBY
        • PREVIOUSVALUE
        • RANK
        • RANKBY
        • RECORDID
        • ROLLINGAVERAGE
        • ROW
        • RUNNINGAVERAGE
        • RUNNINGMAX
        • RUNNINGMIN
        • RUNNINGSTDEV
        • RUNNINGTOTAL
        • WINDOWAVERAGE
        • WINDOWMAX
        • WINDOWMIN
        • WINDOWCOUNT
        • WINDOWSUM
        • SEQUENCE
        • WINDOWMEDIAN
        • WINDOWSTDEV
        • WINDOWSTDEVP
        • WINDOWSTDEVS
        • WINDOWVAR
        • WINDOWVARP
        • WINDOWVARS
        • WINDOWCORR
        • WINDOWCOVAR
        • WINDOWCOVARP
        • WINDOWCOVARS
        • SMOOTHEDAVERAGE
      • Text Functions
        • ASCII
        • CHAR
        • CONCAT
        • CONTAINS
        • ENDSWITH
        • FIND
        • FINDNTH
        • ISEMPTY
        • JSONPARSE
        • LEFT
        • LENGTH
        • LOWER
        • LTRIM
        • MID
        • PROPER
        • RIGHT
        • RTRIM
        • SPACE
        • SPLIT
        • STARTSWITH
        • TRIM
        • SUBSTITUTE
        • UPPER
    • Building Expressions
      • Expression Operators
      • Guide to Window Functions
  • Cascade FAQs
    • Best Practices
      • 💬How to add a total row to a table
      • 💬How to leave comments on a workflow
      • 💬How to add new columns in the Edit Columns tool
      • 💬Setting up a New Table tool
      • 💬How to rename a tool
    • Knowledge Based
      • 💬How to change Data Types in Cascade
      • 💬How to remove columns from a table in Cascade
      • 💬How to rename columns in Cascade
      • 💬Understanding the Join options in the Cascade Join tool
      • 💬How to connect tools to each other
    • Import
      • 💬How to Import an Excel File into Cascade
      • 💬How to Import a CSV File into Cascade
      • 💬How to import a CSV file into the Data Locker
    • Functions and Expressions
      • 💬How to write an IN() statement with multiple variables
      • 💬How to Remove null Records with a Filter tool
      • 💬How to write an IF Statement in Cascade
      • 💬How to replace null values with 0
    • Troubleshooting
      • ⚠️What does it mean if my workflow won’t load?
      • ⚠️Why can’t I connect my tool to other tools?
      • ⚠️Why are there duplicate records after my Join tool?
  • Change Log
On this page
  • Configuration
  • Example
  • Outputs

Was this helpful?

  1. Tools
  2. Transform

Deduplicate

Remove duplicate values in a column

PreviousUnpivotNextMerge

Last updated 2 years ago

Was this helpful?

Deduplicate removes any duplicate values from the selected columns in an input table. When removing duplicate values, the tool will remove the entire row of duplicates found.

Selection

Description

Fields

Select the field(s) in your table to search for duplicates in. Users can choose a single field or multiple, to remove duplicates of combinations of values across fields.

Keep (optional)

Choose between 2 options:

  • First (default)

    • When duplicates are found in your table, this option will choose to keep the row of the first duplicate value found in the set of duplicates

  • Last

    • When duplicates are found in your table, this option will choose to keep the row of the last value found in the set of duplicates

Configuration

Deduplicate allows users to easily remove duplicate values across one or multiple fields in the connected table. The Deduplicate tool is set to auto-run, which means that as soon as connecting to an input table, it will immediately run and search/remove duplicates across all fields in the table (aka any identical full rows of data). Simply open the tool and make your Fields and Keep selections to configure.

As seen above, the Fields prompt will allow you to choose from a dropdown list of all of the columns in your table. Select one or more to run the deduplicate tool on.

When configuring your Deduplicate tool, you will see two rows of numbers dynamically changing in between the Fields and Keep options.

Finally, make your Keep selection - decide whether you want to keep the row of the first instance of your duplicates or the last.

Example

Let's say we have a dataset of baseball players that needs some clean up. The dataset has taken in data from various sources and as a result has multiple duplicate values. To better show how the tool works, we've color coded the duplicate values.

After connecting the above table to the Deduplicate tool, we'll configure by selecting playerID in the Fields prompt and "First" in the Keep Prompt.

As a result, we'll get the table below as our output. As you can see by the colors and data of the rows remaining, the Deduplicate tool removed all rows where there were duplicate playerID values and kept the first row of each duplicate value.

Outputs

The Deduplicate tool outputs two tables: one with all duplicate value rows removed and one with all duplicate value rows.

I
Row counts represent the table size before and after removing duplicates