Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
How do I find duplicate records?
Message
 
 
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
00431045
Message ID:
00431815
Views:
12
Vlad,

>Well, I see you don't understand what I'm saying.
>Now I looked more carefully and found even more errors ;)
>
>Let suggest you have following data set:
>

Yes, now I can see holes in my logic too. Your logic seems to be perfect (I haven't tried it yet). I'll try indexseek vs. current update statement also.

Thanks a million. Finally, we made it :) Or should I say: you made it? :)))

>
>#     Address   Date
>1     Ad1       Date()
>2     Ad2       Date()-5
>3     Ad3       Date()-20
>4     Ad4       Date()-25
>5     Ad5       Date()-30
>6     Ad6       Date()-1000
>7     Ad1       Date()        &&& Note this record is duplicated
>8     Ad2       Date()-6
>9     Ad3       Date()-21
>
>
>First query will return following:
>
>#     Address   Date
>1     Ad1       Date()       &&& Note this record is now only one
>2     Ad2       Date()-5
>8     Ad2       Date()-6
>3     Ad3       Date()-20
>9     Ad3       Date()-21
>4     Ad4       Date()-25
>5     Ad5       Date()-30
>6     Ad6       Date()-1000
>
>
>As you see, Ad1 is stored twise under the same date. But result don't indicate that, so records will not be updated properly.
>
>
>Second query will return following:
># Address Date
>1 Ad1 Date()
>2 Ad2 Date()-5
>8 Ad2 Date()-6
>3 Ad3 Date()-20
>9 Ad3 Date()-21
>4 Ad4 Date()-25
>5 Ad5 Date()-30
>6 Ad6 Date()-1000
>
>
>Yes, I'm sure ;) Your join condition should be following:
>
>from myCursor inner join myCursor my on myCursor.situs=my.situs ;
> and abs(myCursor.date - my.date) <= m.ndays ;
> and myCursor.date <> my.date ;
>
>But than you will not be able to solve problem from first query with addresses under the same date.
>
>I propose following solution (see differenses in bold):
>
>
>
>Yes, I'm sure ;) Your join condition should be following:
>
>from myCursor inner join myCursor my on myCursor.situs=my.situs ;
> and abs(myCursor.date - my.date) <= m.ndays ;
> and myCursor.date <> my.date ;
>
>But than you will not be able to solve problem from first query with addresses under the same date.
>
>I propose following solution (see differenses in bold):
>
>
>* Firstly select all records grouped by address and date field
>     select  ;
>          upper(ccode+town+street+str(stnum,4)+stnumext+unit) as situs, ;
>             date, count(*) as SameDateDupl ;
>          from (m.dbfname) ;
>          group by 1,2 ;
>          into cursor myCursor
>
>** Prepare final cursor by self-join
>     select myCursor.Situs+dtoc(myCursor.date) as SitusDate, myCursor.* ;
>          from myCursor inner join myCursor my on myCursor.situs=my.situs ;
>            and abs(myCursor.date - my.date)             and myCursor.date  my.date ;
>          into cursor (DDAL_QUERY) ;
>     UNION ALL ;
>     select myCursor.Situs+dtoc(myCursor.date) as SitusDate, myCursor.* ;
>          from myCursor ;
>          where SameDateDupl>1
>
>     if used ('MyCursor')
>          use in myCursor
>     endif
>
>* Then update the BldMstr table
>     update (dbfname) set dedupeflag= DDFC_POSS ;
>          where upper(ccode+town+street+str(stnum,4)+stnumext+unit)+dtoc(date) ;
>          in (select SitusDate from (DDAL_QUERY))
>
>
>
>Why you use 'order by'? It slows down query, as far as I know.
>And for last updating I also recommend you to index DDAL_QUERY result and use IndexSeek() function instead of subquery in Update command.
>
>Hope this helped.
>
>
>
>>>>
>>>><font color=green>* Firstly select all records grouped by address and date field</font>
>>>>    <font color=blue> select</font>  ;
>>>>        <font color=blue> upper</font>(ccode+town+street+<font color=blue>str</font>(stnum,4)+stnumext+unit)<font color=blue> as</font> situs,<font color=blue> date</font> ;
>>>>         <font color=blue> from</font> (m.dbfname) ;
>>>>         <font color=blue> group</font><font color=blue> by</font> 1,2 ;
>>>>         <font color=blue> order</font><font color=blue> by</font> 1,2<font color=blue> into</font><font color=blue> cursor</font> myCursor
>>>>
>>>><font color=green>** Prepare final cursor by self-join</font>
>>>>    <font color=blue> select</font> myCursor.Situs+<font color=blue>dtoc</font>(myCursor.<font color=blue>date</font>)<font color=blue> as</font> SitusDate, myCursor.* ;
>>>>         <font color=blue> from</font> myCursor inner<font color=blue> join</font> myCursor my<font color=blue> on</font> myCursor.situs=my.situs ;
>>>>          and<font color=blue> abs</font>(myCursor.<font color=blue>date</font> - my.<font color=blue>date</font>) <= m.ndays ;
>>>>         <font color=blue> order</font><font color=blue> by</font> 1<font color=blue> into</font><font color=blue> cursor</font> (DDAL_QUERY)
>>>>
>>>>    <font color=blue> if</font><font color=blue> used</font> ('MyCursor')
>>>>         <font color=blue> use</font><font color=blue> in</font> myCursor
>>>>    <font color=blue> endif</font>
>>>>
>>>><font color=green>* Then update the BldMstr table</font>
>>>>    <font color=blue> update</font> (dbfname)<font color=blue> set</font> dedupeflag= DDFC_POSS ;
>>>>         <font color=blue> where</font><font color=blue> upper</font>(ccode+town+street+<font color=blue>str</font>(stnum,4)+stnumext+unit)+<font color=blue>dtoc</font>(<font color=blue>date</font>) ;
>>>>         <font color=blue> in</font> (<font color=blue>select</font> SitusDate<font color=blue> from</font> (DDAL_QUERY))
>>>>
>>>>     m.result=<font color=blue>_tally</font> <font color=green>&& Number of updated records</font>
If it's not broken, fix it until it is.


My Blog
Previous
Reply
Map
View

Click here to load this message in the networking platform