SQL Deduplication Generator — Remove Duplicate Rows

Generate SQL Server queries to find and remove duplicate rows using ROW_NUMBER(). Choose which row to keep (first or most recent), preview before deleting, and pick the right pattern for your table size — CTE delete for standard use, INSERT INTO new table for large datasets.

Deduplication Configuration

SQL Server / Azure SQL
📚 Guide & Tutorial: How to Deduplicate Rows in SQL: Complete Guide with Examples — step-by-step walkthrough with examples.

Examples

Example: Preview duplicates before deleting — customer email dedup

Find all duplicate customer rows by email. Preview which rows would be removed, keeping the most recent.

WITH Ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
)
SELECT *
FROM Ranked
WHERE rn > 1;

Always run this SELECT first to confirm you are targeting the right rows before running the DELETE.

Example: CTE DELETE — remove duplicates keeping most recent row

Delete duplicate customer rows by email, keeping the row with the most recent CreatedDate.

WITH Ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
)
DELETE FROM Ranked
WHERE rn > 1;

Example: Fast dedup for large tables — INSERT INTO new table

For tables with millions of rows, deleting row-by-row is slow. This pattern inserts distinct rows into a new table.

-- Step 1: Insert distinct rows into staging
SELECT *
INTO dbo.Customers_Deduped
FROM (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY [Email]
               ORDER BY [CreatedDate] DESC
           ) AS rn
    FROM dbo.Customers
) AS x
WHERE rn = 1;

-- Step 2: Verify count
SELECT COUNT(*) FROM dbo.Customers_Deduped;

-- Step 3 (only after verification): swap tables
EXEC sp_rename 'dbo.Customers', 'dbo.Customers_Backup';
EXEC sp_rename 'dbo.Customers_Deduped', 'dbo.Customers';

Always verify the row count in the staging table before swapping. Keep the backup table until you are confident.

What This Tool Does

The SQL Deduplication Generator produces T-SQL to find and remove duplicate rows in a SQL Server table. It uses the ROW_NUMBER() window function to rank rows within each duplicate group, then selects or deletes all rows ranked higher than 1.

The tool generates four output modes: a safe SELECT preview that shows which rows would be affected without making any changes; a CTE + DELETE pattern that is the standard production-safe approach; a direct DELETE; and an INSERT INTO new table pattern for high-performance deduplication of large tables.

How to Use It

Choosing the Right Pattern

PatternSpeedSafetyBest For
SELECT previewFastRead-onlyAlways run first to verify
CTE + DELETEStandardHigh — wrappable in transactionStandard production deduplication
Direct DELETEStandardMediumSimple cases
INSERT INTO new tableFastest for large dataHigh — original preservedTables with millions of rows

Why ROW_NUMBER() Is the Right Approach

Before ROW_NUMBER(), deduplication required self-joins or subqueries that are harder to read and easier to get wrong. ROW_NUMBER() makes the intent explicit: partition by the key columns, order by the tiebreaker column, keep row 1, delete the rest.

The key advantage over GROUP BY deduplication is row identity: you keep a specific real row from the table rather than constructing a synthetic row from MIN/MAX aggregates. This matters when you need to preserve all columns, not just the grouped columns.

Privacy & Security

All SQL generation runs in your browser. Your table names, column names, and any filter values are never transmitted to any server.

Important: always run deduplication in a transaction and verify the affected row count before committing. Back up production tables before running bulk delete operations.

Common Problems & Fixes

ProblemCauseFix
Cannot DELETE from a CTE with ROW_NUMBERSQL Server actually supports this — common misconceptionIt works correctly in SQL Server. The CTE with DELETE is valid T-SQL.
Deletes too many rowsPartition By columns too broad — too many rows considered 'duplicates'Run the SELECT preview first. Narrow the PARTITION BY to only the columns that define uniqueness.
Deletes too few rowsPartition By columns too specific — some duplicates not matchedCheck whether all duplicate-defining columns are included in PARTITION BY
Performance is slow on large tableRow-by-row delete is slow for millions of rowsUse the INSERT INTO new table pattern instead
Cannot modify the source table — permissionsDELETE requires DELETE permission on the tableRequest permission or have the table owner run the generated query

💡 Deduplication is often a prerequisite before other data operations. After deduplicating, use the PIVOT Generator to reshape the clean data for reporting, or the Date Spine Generator to ensure complete date coverage in your result set. For formatting the generated DELETE or CTE query, the SQL Formatter cleans up indentation and keyword casing.

Deduplication in the SQL Workflow

Deduplicate as a preparation step before analysis:

Frequently Asked Questions

What is the safest way to deduplicate a SQL Server table?
The safest approach is: (1) run the SELECT version of the query to verify which rows would be deleted, (2) run the DELETE in a transaction and verify the count before committing, (3) back up the table first if it is production data.
What is the difference between ROW_NUMBER() and GROUP BY deduplication?
ROW_NUMBER() assigns a rank to each row within a partition, letting you keep one specific row (e.g. the most recent by date). GROUP BY deduplication selects the MIN or MAX of each column, which loses row identity. ROW_NUMBER() is more flexible and handles most real-world cases.
How do I keep the most recent duplicate instead of the first?
In the ORDER BY clause of ROW_NUMBER(), use a date or timestamp column in DESC order. The most recent row gets rank 1 and is kept; earlier duplicates are deleted.
Can I deduplicate without a primary key?
Yes. Use ROW_NUMBER() with a PARTITION BY over the columns that define uniqueness. If no natural ordering column exists, use ORDER BY (SELECT NULL) — this keeps an arbitrary row for each duplicate set.
Is there a faster way to deduplicate large tables?
For large tables: insert distinct rows into a new table, truncate the original, and re-insert. This avoids row-by-row deletion and is significantly faster for millions of rows. The generator includes this pattern as an option.

Related SQL Tools