Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
How do I find duplicate records?
Message
From
19/10/2000 14:15:53
 
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
00431045
Message ID:
00431662
Views:
9
Hi!

This will work more quickly that with subquery, when you don't have indexes for optimisation.

Little note:
Use

date between mdate-10 and mdate+10

instead of

between(date, mdate-10, mdate+10)


in where clause. This because between() function will not be optimized, when between operator in where clause optimized.


>This discussion gives me this idea:
>
>1) select upper(...+...) as Situs from myTable group by 1 where between(date, mdate-10, mdate+10) having count(*)>1 into cursor curDups
>
> update myTable set flag='D' where upper(...) in (select Situs from curDups) and between(date, mdate-10, mdate+10)
>
> Seems to work (at least written here :)), though I have to check it.
>
> Thanks a lot for your support.
>
>>Hi Vlad,
>>
>>Once again: this update would not do what I want. Can you see? :)
>>1) You select correct duplicate addresses in your subquery.
>>Not let's imagine this situation: Address1 is the duplicate by my definition (it's duplicate and date within 7)
>>
>>Record #  Address  Date
>>1         Address1 date=mdate-1000
>>2         Address1 date=mdate-6
>>3         Address1 date=mdate-5
>>
>>Records 2 and 3 should be marked, while record 1 should not, right?
>>You update would mark all of them as duplicates.
>>
>>BTW, I do have index on address. See my other post with the whole program.
>>
>>>>You group by ID field! :) You will have then all records from the table :)
>>>
>>>Sure ;)
>>>Than you will not be able to use this query even when you will remove ID field from grouping because ID will be returned incorrectly from subquery. Need another approach:
>>>
>>>Update MyTable
>>> set MyTable.flag='D'
>>> where (MyTable.ccode+MyTable.town+MyTable.street+str(MyTable.strnum,4)+...) IN
>>>(Select (MT.ccode+MT.town+MT.street+str(MT.strnum,4)+...) as cAddress
>>> from MyTable MT
>>> where MT.MyDateField betwen m.ldDate - 10 AND m.ldDate + 10
>>> group by cAddress
>>> having count(*)>1)
>>>
>>>Will work quite slowly unless you have index by (ccode+town+street+str(strnum,4)+...)
>>>
>>>
>>>
>>>>
>>>>>Hi!
>>>>>
>>>>>>Nope, this would not work. Do you want me to explain why or can figure it by yourself? :)
>>>>>
>>>>> Well, expression should be (MT.ccode+MT.town+MT.street+str(MT.strnum,4)+...), right? I did not tried such king of queries, so will be glad to know exactly what is wrong ;)
>>>>>
>>>>>>
>>>>>>>Still the same idea ;))
>>>>>>>
>>>>>>>Update MyTable
>>>>>>> set flag='D'
>>>>>>> where MyTable.IDField IN
>>>>>>>(Select MT.IDField
>>>>>>> from MyTable MT
>>>>>>> where MT.MyDateField betwen m.ldDate - 10 AND m.ldDate + 10
>>>>>>> group by (ccode+town+street+str(strnum,4)+...), MT.IDField
>>>>>>> having count(MT.IDField)>1)
>>>>>>>
>>>>>>>I hope this will work, I'm not sure if we can use exression in GROUP BY list.
>>>>>>>
>>>>>>>
>>>>>>>>>Hi!
>>>>>>>>>
>>>>>>>>>The same idea. 'Where' will filter out records for count() function.
>>>>>>>>>
>>>>>>>>>Select IDField, count(IDField) as nCount
>>>>>>>>> from MyTable
>>>>>>>>> where MyDateField betwen m.ldDate - 10 AND m.ldDate + 10
>>>>>>>>> group by IDField
>>>>>>>>> having nCount>1
>>>>>>>>
>>>>>>>>Hi Vlad,
>>>>>>>>
>>>>>>>>Actually, I need to mark all duplicate records in myTable (set flag='D'), which have the same address (ccode+town+street+str(strnum,4)+...) and date within the range. How can I do this?
>>>>>>>>
>>>>>>>>My current solution is quite combersome, so I'm open for the suggestions.
>>>>>>>>
>>>>>>>>Thanks in advance.
>>>>>>>>>
>>>>>>>>>>>>I'm switching an index to a primary and am getting a duplicate index message. How can I find the duplicates amoung 1000's without
>>>>>>>>>>>>looking through them all?
>>>>>>>>>>>
>>>>>>>>>>>Select IDField, count(IDField) as nCount
>>>>>>>>>>> from MyTable
>>>>>>>>>>> group by IDField
>>>>>>>>>>> having nCount>1
>>>>>>>>>>
>>>>>>>>>>This question was asked so often, what it definitely should be in a FAQ.
>>>>>>>>>>
>>>>>>>>>>BTW, a little more complicated variation of this problem:
>>>>>>>>>>Find all duplicates, which have the same Id field and date within date specified +-10.
Vlad Grynchyshyn, Project Manager, MCP
vgryn@yahoo.com
ICQ #10709245
The professional level of programmer could be determined by level of stupidity of his/her bugs

It is not appropriate to say that question is "foolish". There could be only foolish answers. Everybody passed period of time when knows nothing about something.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform