sql - How to delete huge duplicate data except one record without identical column/Primary key -


i want delete duplicate rows in table when no primary key defined(non normalised db).

my problem - table has approx 540 million records. used cte delete records taking time more 8 hours. want optimize query. example: if have table1 data below,

id fname lname  1 aaa ccc 2 bbb ddd 1 aaa ccc 2 bbb ddd 1 aaa ccc 2 bbb ddd 3 bcb dgd 

remove duplicate rows , keep data in table using single query.

id fname lname 1 aaa ccc 2 bbb ddd 3 bcb dgd 

previously applied type of query-

;with tblcte(empid,ranking) ( select empid, ranking = dense_rank() on (partition empid order newid()) @tbl ) delete tblcte ranking > 1 select * @tbl order empid 

but taking time.

i want solution answer these conditions:

  1. no primary key or identical column
  2. data more 540 million, query should take less time delete records.

try this

 tempid (  select *,   row_number() over(partition id, fname,lname order id) [num]  employee)  delete tempid [num] > 1  select * employee 

find solution in fiddle http://sqlfiddle.com/#!6/394a9/1


Comments