Tuesday, June 3, 2008

SQL Query to remove duplicate records from a table?


Question: How to remove duplicate records from a table using SQL?

Answer: There may be quite a few possible ways of doing this, but the two most common ways are:-

Simpler and Faster approach - In this approach we simply create anothet table having only the distinct rows of the original table, drop the original table, and finally rename the new table to the original table. Voila! We're done. But, do remember that dropping a table will drop all the indexes and priviledges as well. So, you'll need to create them again.

CREATE TABLE NEW_TABLE AS SELECT DISTINCT * FROM ORIGINAL_TABLE;
DROP TABLE ORIGINAL_TABLE;
RENAME NEW_TABLE TO ORIGINAL_TABLE;
...create indexes/privileges on ORIGINAL_TABLE now...

The standard ROWID approach - it's the same approach where we simply compare the ROWID of the records having the same key values (duplicate records) and select only one of the duplicate rows - the one having either the min or the max ROWID. Don't worry, these ROWIDs are system generated and will never be duplicate, so you won't be having more than one min (or max).


DELETE FROM ORIGINAL_TABLE T1 WHERE ROWID > (SELECT MIN(ROWID) FROM ORIGINAL_TABLE T2WHERE T1.KEY = T2.KEY);

OR

DELETE FROM ORIGINAL_TABLE T1 WHERE ROWID < (SELECT MAX(ROWID) FROM ORIGINAL_TABLE T2WHERE T1.KEY = T2.KEY);


Here KEY represents the set of columns based on which we're deciding the duplicates.



Share/Save/Bookmark


2 comments:

Anonymous said...

delete from Student where rowid not in (select min(rowid) from Student group by name)

--The above sql deletes all duplicate records with same name in a Student table.

Anonymous said...

the following link provides more information on different methods of deleting duplicate records

http://www.besttechtools.com/SQLArticles.aspx?ID=DeleteDuplicate