# Deduplicate

**Deduplicate** removes any duplicate values from the selected columns in an input table. When removing duplicate values, the tool will remove the entire row of duplicates found.

| Selection           | Description                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Fields**          | Select the field(s) in your table to search for duplicates in. Users can choose a single field or multiple, to remove duplicates of combinations of values across fields.                                                                                                                                                                                                                                                                                 |
| **Keep** (optional) | <p>Choose between 2 options:</p><ul><li><p>First <em>(default)</em> </p><ul><li>When duplicates are found in your table, this option will choose to keep the row of the <strong>first</strong> duplicate value found in the set of duplicates</li></ul></li><li><p>Last</p><ul><li>When duplicates are found in your table, this option will choose to keep the row of the <strong>last</strong> value found in the set of duplicates</li></ul></li></ul> |

### Configuration

Deduplicate allows users to easily remove duplicate values across one or multiple fields in the connected table. The Deduplicate tool is set to auto-run, which means that as soon as connecting to an input table, it will immediately run and search/remove duplicates across all fields in the table (aka any identical full rows of data). Simply open the tool and make your *Fields* and *Keep* selections to configure. &#x20;

<figure><img src="/files/jIg7XrKQcX7DQChLvYZZ" alt=""><figcaption></figcaption></figure>

As seen above, the *Fields* prompt will allow you to choose from a dropdown list of all of the columns in your table. Select one or more to run the deduplicate tool on.&#x20;

When configuring your Deduplicate tool, you will see two rows of numbers dynamically changing in between the *Fields* and *Keep* options.

<figure><img src="/files/XnXPU1xHAMF84p1L8VfG" alt=""><figcaption><p>Row counts represent the table size before and after removing duplicates</p></figcaption></figure>

Finally, make your *Keep* selection - decide whether you want to keep the row of the first instance of your duplicates or the last.&#x20;

### Example

Let's say we have a dataset of baseball players that needs some clean up. The dataset has taken in data from various sources and as a result has multiple duplicate values. To better show how the tool works, we've color coded the duplicate values.&#x20;

{% embed url="<https://datawrapper.dwcdn.net/6N2lX/2/>" %}
I
{% endembed %}

After connecting the above table to the Deduplicate tool, we'll configure by selecting `playerID` in the *Fields* prompt and "First" in the *Keep* Prompt.&#x20;

<div align="left"><figure><img src="/files/Oirm8rYvmTN4kMIzu2r3" alt=""><figcaption></figcaption></figure></div>

<div align="left"><figure><img src="/files/jENSvYSKP6OcaWYjC569" alt=""><figcaption></figcaption></figure></div>

As a result, we'll get the table below as our output. As you can see by the colors and data of the rows remaining, the Deduplicate tool removed all rows where there were duplicate `playerID` values and kept the first row of each duplicate value.&#x20;

{% embed url="<https://datawrapper.dwcdn.net/c0zP9/1/>" %}

### Outputs

The Deduplicate tool outputs two tables: one with all duplicate value rows removed and one with all duplicate value rows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cascade.io/cascade/tools/transform/dedupe.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
