sql - How to delete huge duplicate data except one record without identical column/Primary key -

i want delete duplicate rows in table when no primary key defined(non normalised db).

my problem - table has approx 540 million records. used cte delete records taking time more 8 hours. want optimize query. example: if have table1 data below,

id fname lname  1 aaa ccc 2 bbb ddd 1 aaa ccc 2 bbb ddd 1 aaa ccc 2 bbb ddd 3 bcb dgd

remove duplicate rows , keep data in table using single query.

id fname lname 1 aaa ccc 2 bbb ddd 3 bcb dgd

previously applied type of query-

;with tblcte(empid,ranking) ( select empid, ranking = dense_rank() on (partition empid order newid()) @tbl ) delete tblcte ranking > 1 select * @tbl order empid

but taking time.

i want solution answer these conditions:

no primary key or identical column
data more 540 million, query should take less time delete records.

try this

 tempid (  select *,   row_number() over(partition id, fname,lname order id) [num]  employee)  delete tempid [num] > 1  select * employee

find solution in fiddle http://sqlfiddle.com/#!6/394a9/1

WIKI

Search This Blog

sql - How to delete huge duplicate data except one record without identical column/Primary key -

Comments

Post a Comment