Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
How do I find duplicate records?
Message
From
19/10/2000 16:22:28
 
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
00431045
Message ID:
00431791
Views:
11
Hi!

>Vlad,
>
>BTW, forgot to congratulate you for being MOTD :)

Again! Thanks ;)

>>>BTW, studing the code I found, that idea is much more complicated, than I thought. I don't have mdate, the records should be marked as duplicate, if they have dates within some range, IOW:
>>>
>>>
>>>#     Address   Date
>>>1     Ad1       Date()
>>>2     Ad1       Date()-5
>>>3     Ad1       Date()-20
>>>4     Ad2       Date()-25
>>>5     Ad2       Date()-30
>>>6     Ad2       Date()-1000
>>>
>>>
>>>Records 1 and 2 are duplicates.
>>>Records 4 and 5 are duplicates.
>>>
>>>See the new logic?
>>
>>Yep. But when you group by address and date in the first query, you eliminate all duplicates in the same date. Is it correct? I meant when you have 2 records with the same date and address, and these are only records with that address, these two records will not be marked as duplicated.
>
>No, see more carefully the code bellow. Notice, please, that I don't have having clause in the first quiery, so I basically will select all records (if I have records with the same address and date, the result returns only one record for this situation)
>


Well, I see you don't understand what I'm saying.
Now I looked more carefully and found even more errors ;)

Let suggest you have following data set:
#     Address   Date
1     Ad1       Date()
2     Ad2       Date()-5
3     Ad3       Date()-20
4     Ad4       Date()-25
5     Ad5       Date()-30
6     Ad6       Date()-1000
7     Ad1       Date()        &&& Note this record is duplicated
8     Ad2       Date()-6
9     Ad3       Date()-21
First query will return following:
#     Address   Date
1     Ad1       Date()       &&& Note this record is now only one
2     Ad2       Date()-5
8     Ad2       Date()-6
3     Ad3       Date()-20
9     Ad3       Date()-21
4     Ad4       Date()-25
5     Ad5       Date()-30
6     Ad6       Date()-1000
As you see, Ad1 is stored twise under the same date. But result don't indicate that, so records will not be updated properly.


Second query will return following:
# Address Date
1 Ad1 Date()
2 Ad2 Date()-5
8 Ad2 Date()-6
3 Ad3 Date()-20
9 Ad3 Date()-21
4 Ad4 Date()-25
5 Ad5 Date()-30
6 Ad6 Date()-1000


Yes, I'm sure ;) Your join condition should be following:

from myCursor inner join myCursor my on myCursor.situs=my.situs ;
and abs(myCursor.date - my.date) <= m.ndays ;
and myCursor.date <> my.date ;

But than you will not be able to solve problem from first query with addresses under the same date.

I propose following solution (see differenses in bold):



Yes, I'm sure ;) Your join condition should be following:

from myCursor inner join myCursor my on myCursor.situs=my.situs ;
and abs(myCursor.date - my.date) <= m.ndays ;
and myCursor.date <> my.date ;

But than you will not be able to solve problem from first query with addresses under the same date.

I propose following solution (see differenses in bold):
* Firstly select all records grouped by address and date field
     select  ;
          upper(ccode+town+street+str(stnum,4)+stnumext+unit) as situs, ;
             date, count(*) as SameDateDupl ;
          from (m.dbfname) ;
          group by 1,2 ;
          into cursor myCursor

** Prepare final cursor by self-join
     select myCursor.Situs+dtoc(myCursor.date) as SitusDate, myCursor.* ;
          from myCursor inner join myCursor my on myCursor.situs=my.situs ;
            and abs(myCursor.date - my.date) and myCursor.date  my.date ;
          into cursor (DDAL_QUERY) ;
     UNION ALL ;
     select myCursor.Situs+dtoc(myCursor.date) as SitusDate, myCursor.* ;
          from myCursor ;
          where SameDateDupl>1

     if used ('MyCursor')
          use in myCursor
     endif

* Then update the BldMstr table
     update (dbfname) set dedupeflag= DDFC_POSS ;
          where upper(ccode+town+street+str(stnum,4)+stnumext+unit)+dtoc(date) ;
          in (select SitusDate from (DDAL_QUERY))
Why you use 'order by'? It slows down query, as far as I know.
And for last updating I also recommend you to index DDAL_QUERY result and use IndexSeek() function instead of subquery in Update command.

Hope this helped.



>>>
>>><font color=green>* Firstly select all records grouped by address and date field</font>
>>>    <font color=blue> select</font>  ;
>>>        <font color=blue> upper</font>(ccode+town+street+<font color=blue>str</font>(stnum,4)+stnumext+unit)<font color=blue> as</font> situs,<font color=blue> date</font> ;
>>>         <font color=blue> from</font> (m.dbfname) ;
>>>         <font color=blue> group</font><font color=blue> by</font> 1,2 ;
>>>         <font color=blue> order</font><font color=blue> by</font> 1,2<font color=blue> into</font><font color=blue> cursor</font> myCursor
>>>
>>><font color=green>** Prepare final cursor by self-join</font>
>>>    <font color=blue> select</font> myCursor.Situs+<font color=blue>dtoc</font>(myCursor.<font color=blue>date</font>)<font color=blue> as</font> SitusDate, myCursor.* ;
>>>         <font color=blue> from</font> myCursor inner<font color=blue> join</font> myCursor my<font color=blue> on</font> myCursor.situs=my.situs ;
>>>          and<font color=blue> abs</font>(myCursor.<font color=blue>date</font> - my.<font color=blue>date</font>) <= m.ndays ;
>>>         <font color=blue> order</font><font color=blue> by</font> 1<font color=blue> into</font><font color=blue> cursor</font> (DDAL_QUERY)
>>>
>>>    <font color=blue> if</font><font color=blue> used</font> ('MyCursor')
>>>         <font color=blue> use</font><font color=blue> in</font> myCursor
>>>    <font color=blue> endif</font>
>>>
>>><font color=green>* Then update the BldMstr table</font>
>>>    <font color=blue> update</font> (dbfname)<font color=blue> set</font> dedupeflag= DDFC_POSS ;
>>>         <font color=blue> where</font><font color=blue> upper</font>(ccode+town+street+<font color=blue>str</font>(stnum,4)+stnumext+unit)+<font color=blue>dtoc</font>(<font color=blue>date</font>) ;
>>>         <font color=blue> in</font> (<font color=blue>select</font> SitusDate<font color=blue> from</font> (DDAL_QUERY))
>>>
>>>     m.result=<font color=blue>_tally</font> <font color=green>&& Number of updated records</font>
Vlad Grynchyshyn, Project Manager, MCP
vgryn@yahoo.com
ICQ #10709245
The professional level of programmer could be determined by level of stupidity of his/her bugs

It is not appropriate to say that question is "foolish". There could be only foolish answers. Everybody passed period of time when knows nothing about something.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform