Hi everyone,
I have an application called De-Duper, which identifies duplicates in a table based on one field or combination of few fields. Once duplicates are identified, user can review them, combine, delete, edit, etc.
My colleague, who was testing this application, made a suggestion: this application should identify complete duplicates (e.g. almost all fields should be identical), then delete all duplicate records except one.
I have one solution in mind, but not sure, it's the best. Besides, it would not be Rushmore optimizable, so may take a long time.
Ok, here is what I have in mind:
Table has a RecNum Identifier field.
1) Create a list of all fields, which should be checked (comma-delimited) in a string. This part is already done.
2)
lcStr='select *, count(*) as cnt_dups from BldMstr group by '+m.lcFieldList+ ;
' having cnt_dups>1 into cursor curCompleteDups'
strtofile(m.lcStr, m.lcTempPrg)
compile (m.lcTempPrg)
do (m.lcTempPrg)
delete (m.lcTempPrg)
3) create similar string
m.lcDelStr='field1=curCompleteDups.field1 and field2=curCompleteDups.field2"...
lcStr='delete for '+m.lcDelStr+' and RecNum<>curCompleteDups.RecNum'
create a temp program and execute it.
Do you think, it's feassible? Or can you suggest a better approach?
Thanks in advance.
If it's not broken, fix it until it is.
My Blog