SQL Deduplication Generator — Remove Duplicate Rows

Generate SQL Server queries to find and remove duplicate rows using ROW_NUMBER(). Choose which row to keep (first or most recent), preview before deleting, and pick the right pattern for your table size — CTE delete for standard use, INSERT INTO new table for large datasets.

Deduplication Configuration

Table Name required

Partition By Columns columns that define a duplicate

Order By Column determines which row to KEEP

Keep Row

Output Mode

WHERE Filter optional — scope the deduplication

Generated T-SQL

SQL Server / Azure SQL

📚 Guide & Tutorial: How to Deduplicate Rows in SQL: Complete Guide with Examples — step-by-step walkthrough with examples.

Examples

Example: Preview duplicates before deleting — customer email dedup

Find all duplicate customer rows by email. Preview which rows would be removed, keeping the most recent.

WITH Ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
)
SELECT *
FROM Ranked
WHERE rn > 1;

Always run this SELECT first to confirm you are targeting the right rows before running the DELETE.

Example: CTE DELETE — remove duplicates keeping most recent row

Delete duplicate customer rows by email, keeping the row with the most recent CreatedDate.

WITH Ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
)
DELETE FROM Ranked
WHERE rn > 1;

Example: Fast dedup for large tables — INSERT INTO new table

For tables with millions of rows, deleting row-by-row is slow. This pattern inserts distinct rows into a new table.

-- Step 1: Insert distinct rows into staging
SELECT *
INTO dbo.Customers_Deduped
FROM (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
) AS x
WHERE rn = 1;

-- Step 2: Verify count
SELECT COUNT(*) FROM dbo.Customers_Deduped;

-- Step 3 (only after verification): swap tables
EXEC sp_rename 'dbo.Customers', 'dbo.Customers_Backup';
EXEC sp_rename 'dbo.Customers_Deduped', 'dbo.Customers';

Always verify the row count in the staging table before swapping. Keep the backup table until you are confident.

What This Tool Does

The SQL Deduplication Generator produces T-SQL to find and remove duplicate rows in a SQL Server table. It uses the ROW_NUMBER() window function to rank rows within each duplicate group, then selects or deletes all rows ranked higher than 1.

The tool generates four output modes: a safe SELECT preview that shows which rows would be affected without making any changes; a CTE + DELETE pattern that is the standard production-safe approach; a direct DELETE; and an INSERT INTO new table pattern for high-performance deduplication of large tables.

How to Use It

Enter your Table Name including schema
Enter the Partition By Columns — the columns that together define what makes a row a duplicate (e.g. Email, or CustomerID + OrderDate)
Enter the Order By Column — the column that determines which row to keep (usually a date or ID)
Choose whether to keep the first (oldest) or last (most recent) row
Choose the Output Mode — always run the SELECT preview first
Add an optional WHERE filter to scope the operation to a subset of rows
Click Generate SQL — run the preview first, then the delete in a transaction

Choosing the Right Pattern

Pattern	Speed	Safety	Best For
SELECT preview	Fast	Read-only	Always run first to verify
CTE + DELETE	Standard	High — wrappable in transaction	Standard production deduplication
Direct DELETE	Standard	Medium	Simple cases
INSERT INTO new table	Fastest for large data	High — original preserved	Tables with millions of rows

Why ROW_NUMBER() Is the Right Approach

Before ROW_NUMBER(), deduplication required self-joins or subqueries that are harder to read and easier to get wrong. ROW_NUMBER() makes the intent explicit: partition by the key columns, order by the tiebreaker column, keep row 1, delete the rest.

The key advantage over GROUP BY deduplication is row identity: you keep a specific real row from the table rather than constructing a synthetic row from MIN/MAX aggregates. This matters when you need to preserve all columns, not just the grouped columns.

Privacy & Security

All SQL generation runs in your browser. Your table names, column names, and any filter values are never transmitted to any server.

Important: always run deduplication in a transaction and verify the affected row count before committing. Back up production tables before running bulk delete operations.

Common Problems & Fixes

Problem	Cause	Fix
Cannot DELETE from a CTE with ROW_NUMBER	SQL Server actually supports this — common misconception	It works correctly in SQL Server. The CTE with DELETE is valid T-SQL.
Deletes too many rows	Partition By columns too broad — too many rows considered 'duplicates'	Run the SELECT preview first. Narrow the PARTITION BY to only the columns that define uniqueness.
Deletes too few rows	Partition By columns too specific — some duplicates not matched	Check whether all duplicate-defining columns are included in PARTITION BY
Performance is slow on large table	Row-by-row delete is slow for millions of rows	Use the INSERT INTO new table pattern instead
Cannot modify the source table — permissions	DELETE requires DELETE permission on the table	Request permission or have the table owner run the generated query

💡 Deduplication is often a prerequisite before other data operations. After deduplicating, use the PIVOT Generator to reshape the clean data for reporting, or the Date Spine Generator to ensure complete date coverage in your result set. For formatting the generated DELETE or CTE query, the SQL Formatter cleans up indentation and keyword casing.

Deduplication in the SQL Workflow

Deduplicate as a preparation step before analysis:

Join to a date spine after deduplicating for accurate time series reports
PIVOT the deduplicated data for summary reporting
Check for blocking queries if deduplication runs slowly due to table locks
Format the deduplication query before deploying to production

Frequently Asked Questions

What is the safest way to deduplicate a SQL Server table?▼

The safest approach is: (1) run the SELECT version of the query to verify which rows would be deleted, (2) run the DELETE in a transaction and verify the count before committing, (3) back up the table first if it is production data.

What is the difference between ROW_NUMBER() and GROUP BY deduplication?▼

ROW_NUMBER() assigns a rank to each row within a partition, letting you keep one specific row (e.g. the most recent by date). GROUP BY deduplication selects the MIN or MAX of each column, which loses row identity. ROW_NUMBER() is more flexible and handles most real-world cases.

How do I keep the most recent duplicate instead of the first?▼

In the ORDER BY clause of ROW_NUMBER(), use a date or timestamp column in DESC order. The most recent row gets rank 1 and is kept; earlier duplicates are deleted.

Can I deduplicate without a primary key?▼

Yes. Use ROW_NUMBER() with a PARTITION BY over the columns that define uniqueness. If no natural ordering column exists, use ORDER BY (SELECT NULL) — this keeps an arbitrary row for each duplicate set.

Is there a faster way to deduplicate large tables?▼

For large tables: insert distinct rows into a new table, truncate the original, and re-insert. This avoids row-by-row deletion and is significantly faster for millions of rows. The generator includes this pattern as an option.