Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Finding Duplicate Records Across 21 tables
Message
From
02/11/1998 10:38:39
 
 
To
02/11/1998 10:25:42
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
00153490
Message ID:
00153524
Views:
19
>Hiya Tim ---
>
>Are you checking for duplicates for each row in each table? If so, a simple change might speed things up considerably:
>
>For your "separate table", create an index on ABS(gross) and invoice (is invoice numeric, is so: TRANS(invoice,"9999999999")+TRANS(gross,"9999999.99") or something similar)....Before you check each row for duplicates, first SEEK in the "separate table" to see if a row matching that row's duplicate condition has already been logged. If so, you can skip checking. The amount of dupe checking table by table will decrease and performance will increase
>
>Note that this will only *if* you check all 21 tables for dupes for each specific row if the number of dupes is important to you.
>
>Another method would be to use a number of cascading UNION SELECTS, as Ed has suggested. But that may drag as VFP is limited in the number of UNIONS you can do in a single SQL statement and you have a lot of tables.
>
>Yet another is to use CREATE TABLE to create your "separate table" and then USE that table and APPEND FROM the other 21 tables, then create a unique index. Again, the question that needs to be addressed: Are you summarizing dupes or just need to know if a dupe exists?
>
>
>>I work for an auditing firm where we are looking for duplicate financial transactions that span approx 50M records over 21 separate tables. We are looking for matches based on, say, ABS(GROSS) and INVOICE across all 21 tables. Basically each record in each table has to be checked against the current table and each record in all the other tables. The results are complied into a separate table with a blank record between groupings. Currently this takes 72 hours. There's got to be a better way! Any suggestions?

We are currently using the first appoach of indexing (STR(GROSS,12,2)+INVOICE) and storing the first record of each in an array. We then start with the smallest value and check the array for a match. If a match is found, SELECTing the work area, SCATTER, SELECTing the resulting table, GATHER. If no match, increment the array variable for the current work area to the current record value, etc. Clear as mud? ;->
Duty is ours, the results are God's - John Quincy Adams
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform