Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Finding Duplicate Records Across 21 tables
Message
De
02/11/1998 10:38:39
 
 
À
02/11/1998 10:25:42
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Divers
Thread ID:
00153490
Message ID:
00153524
Vues:
18
>Hiya Tim ---
>
>Are you checking for duplicates for each row in each table? If so, a simple change might speed things up considerably:
>
>For your "separate table", create an index on ABS(gross) and invoice (is invoice numeric, is so: TRANS(invoice,"9999999999")+TRANS(gross,"9999999.99") or something similar)....Before you check each row for duplicates, first SEEK in the "separate table" to see if a row matching that row's duplicate condition has already been logged. If so, you can skip checking. The amount of dupe checking table by table will decrease and performance will increase
>
>Note that this will only *if* you check all 21 tables for dupes for each specific row if the number of dupes is important to you.
>
>Another method would be to use a number of cascading UNION SELECTS, as Ed has suggested. But that may drag as VFP is limited in the number of UNIONS you can do in a single SQL statement and you have a lot of tables.
>
>Yet another is to use CREATE TABLE to create your "separate table" and then USE that table and APPEND FROM the other 21 tables, then create a unique index. Again, the question that needs to be addressed: Are you summarizing dupes or just need to know if a dupe exists?
>
>
>>I work for an auditing firm where we are looking for duplicate financial transactions that span approx 50M records over 21 separate tables. We are looking for matches based on, say, ABS(GROSS) and INVOICE across all 21 tables. Basically each record in each table has to be checked against the current table and each record in all the other tables. The results are complied into a separate table with a blank record between groupings. Currently this takes 72 hours. There's got to be a better way! Any suggestions?

We are currently using the first appoach of indexing (STR(GROSS,12,2)+INVOICE) and storing the first record of each in an array. We then start with the smallest value and check the array for a match. If a match is found, SELECTing the work area, SCATTER, SELECTing the resulting table, GATHER. If no match, increment the array variable for the current work area to the current record value, etc. Clear as mud? ;->
Duty is ours, the results are God's - John Quincy Adams
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform