SQL Deduplication Generator — Remove Duplicate Rows
Generate SQL Server queries to find and remove duplicate rows using ROW_NUMBER(). Choose which row to keep (first or most recent), preview before deleting, and pick the right pattern for your table size — CTE delete for standard use, INSERT INTO new table for large datasets.
Deduplication Configuration
Generated T-SQL
Examples
Example: Preview duplicates before deleting — customer email dedup
Find all duplicate customer rows by email. Preview which rows would be removed, keeping the most recent.
WITH Ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY [Email]
ORDER BY [CreatedDate] DESC
) AS rn
FROM dbo.Customers
)
SELECT *
FROM Ranked
WHERE rn > 1;Always run this SELECT first to confirm you are targeting the right rows before running the DELETE.
Example: CTE DELETE — remove duplicates keeping most recent row
Delete duplicate customer rows by email, keeping the row with the most recent CreatedDate.
WITH Ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY [Email]
ORDER BY [CreatedDate] DESC
) AS rn
FROM dbo.Customers
)
DELETE FROM Ranked
WHERE rn > 1;Example: Fast dedup for large tables — INSERT INTO new table
For tables with millions of rows, deleting row-by-row is slow. This pattern inserts distinct rows into a new table.
-- Step 1: Insert distinct rows into staging
SELECT *
INTO dbo.Customers_Deduped
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY [Email]
ORDER BY [CreatedDate] DESC
) AS rn
FROM dbo.Customers
) AS x
WHERE rn = 1;
-- Step 2: Verify count
SELECT COUNT(*) FROM dbo.Customers_Deduped;
-- Step 3 (only after verification): swap tables
EXEC sp_rename 'dbo.Customers', 'dbo.Customers_Backup';
EXEC sp_rename 'dbo.Customers_Deduped', 'dbo.Customers';Always verify the row count in the staging table before swapping. Keep the backup table until you are confident.
What This Tool Does
The SQL Deduplication Generator produces T-SQL to find and remove duplicate rows in a SQL Server table. It uses the ROW_NUMBER() window function to rank rows within each duplicate group, then selects or deletes all rows ranked higher than 1.
The tool generates four output modes: a safe SELECT preview that shows which rows would be affected without making any changes; a CTE + DELETE pattern that is the standard production-safe approach; a direct DELETE; and an INSERT INTO new table pattern for high-performance deduplication of large tables.
How to Use It
- Enter your Table Name including schema
- Enter the Partition By Columns — the columns that together define what makes a row a duplicate (e.g. Email, or CustomerID + OrderDate)
- Enter the Order By Column — the column that determines which row to keep (usually a date or ID)
- Choose whether to keep the first (oldest) or last (most recent) row
- Choose the Output Mode — always run the SELECT preview first
- Add an optional WHERE filter to scope the operation to a subset of rows
- Click Generate SQL — run the preview first, then the delete in a transaction
Choosing the Right Pattern
| Pattern | Speed | Safety | Best For |
|---|---|---|---|
| SELECT preview | Fast | Read-only | Always run first to verify |
| CTE + DELETE | Standard | High — wrappable in transaction | Standard production deduplication |
| Direct DELETE | Standard | Medium | Simple cases |
| INSERT INTO new table | Fastest for large data | High — original preserved | Tables with millions of rows |
Why ROW_NUMBER() Is the Right Approach
Before ROW_NUMBER(), deduplication required self-joins or subqueries that are harder to read and easier to get wrong. ROW_NUMBER() makes the intent explicit: partition by the key columns, order by the tiebreaker column, keep row 1, delete the rest.
The key advantage over GROUP BY deduplication is row identity: you keep a specific real row from the table rather than constructing a synthetic row from MIN/MAX aggregates. This matters when you need to preserve all columns, not just the grouped columns.
Privacy & Security
All SQL generation runs in your browser. Your table names, column names, and any filter values are never transmitted to any server.
Important: always run deduplication in a transaction and verify the affected row count before committing. Back up production tables before running bulk delete operations.
Common Problems & Fixes
| Problem | Cause | Fix |
|---|---|---|
| Cannot DELETE from a CTE with ROW_NUMBER | SQL Server actually supports this — common misconception | It works correctly in SQL Server. The CTE with DELETE is valid T-SQL. |
| Deletes too many rows | Partition By columns too broad — too many rows considered 'duplicates' | Run the SELECT preview first. Narrow the PARTITION BY to only the columns that define uniqueness. |
| Deletes too few rows | Partition By columns too specific — some duplicates not matched | Check whether all duplicate-defining columns are included in PARTITION BY |
| Performance is slow on large table | Row-by-row delete is slow for millions of rows | Use the INSERT INTO new table pattern instead |
| Cannot modify the source table — permissions | DELETE requires DELETE permission on the table | Request permission or have the table owner run the generated query |
💡 Deduplication is often a prerequisite before other data operations. After deduplicating, use the PIVOT Generator to reshape the clean data for reporting, or the Date Spine Generator to ensure complete date coverage in your result set. For formatting the generated DELETE or CTE query, the SQL Formatter cleans up indentation and keyword casing.
Deduplication in the SQL Workflow
Deduplicate as a preparation step before analysis:
- Join to a date spine after deduplicating for accurate time series reports
- PIVOT the deduplicated data for summary reporting
- Check for blocking queries if deduplication runs slowly due to table locks
- Format the deduplication query before deploying to production
