Finding Duplicate Records Across 21 tables

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Finding Duplicate Records Across 21 tables

Message

02/11/1998 10:53:58

Edward Pikman
Independent
New York City, New York, États-Unis

02/11/1998 10:46:07

Tim Lawrence
PCC Airfoils, LLC
Broxton, Georgie, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Finding Duplicate Records Across 21 tables

Divers

Thread ID:

00153490

Message ID:

00153536

Vues:

>>>>>>>I work for an auditing firm where we are looking for duplicate financial transactions that span approx 50M records over 21 separate tables. We are looking for matches based on, say, ABS(GROSS) and INVOICE across all 21 tables. Basically each record in each table has to be checked against the current table and each record in all the other tables. The results are complied into a separate table with a blank record between groupings. Currently this takes 72 hours. There's got to be a better way! Any suggestions?
>>>>>>
>>>>>>Would it be faster to UNION ALL tables together and then run GROUP+HAVING queries.
>>>>>
>>>>>Each table is ~1.5G in size-~30G in all. Wouldn't the UNION ALL put me over the 2G file limit?
>>>>
>>>>You said 50M, so I assumed the size. What are you using now?
>>>
>>>50M referred to number of records. Currently using 21 tables with ~2M records in each (~1.5G file size). Simply running a series of code and checking one record against another.
>>
>>I have feeling that you still may consider cascading UNIONs, but concatening only keys which should actually be checked as duplicates. Here, you have chance to stay within 2G, at least you may reduce number of processed tables to 2-3 (you can ordert by some keyvalue to eliminate necessity of afterward cross-checking). Then you run GROUP+HAVING against either 1 resulting table, or cascade it against 2-3 tables.
>
>I'm starting to have the same feeling that a person with a bloody nose might have swimming in the ocean with the sharks...
>
>Cascading UNIONs is an area I haven't gotten into...I've only used very basic SQL statements. Could I beg your indulgence for an example?
>
>Thanks

Select Invoiceno, amount From Table1 ;
Into Cursor tmp ;
UNION ALL Select Invoiceno, amount From Table2 ;
UNION ALL ....

I guess, there is limit on number of UNIONs in one statement, so you may run it in loop. Also, if you still get more then 2G, then you may collect two cascades:
1.
Select Invoiceno, amount From Table1 Where InvoiceNo>100000 ;
Into Cursor tmp1 ;
UNION ALL Select Invoiceno, amount From Table2 Where InvoiceNo>100000 ;
UNION ALL ....

2.
Select Invoiceno, amount From Table1 Where InvoiceNo<=100000 ;
Into Cursor tmp2 ;
UNION ALL Select Invoiceno, amount From Table2 Where InvoiceNo<=100000 ;
UNION ALL ....

Providing distinctive InvoiceNo for tmp1, tmp2, you exclude the necessity of crosschecking them afterward, i.e. you will run GROUP+HAVING against each table separately and move results to common recordset.

Edward Pikman
Independent Consultant

Répondre

Fil

Voir

Click here to load this message in the networking platform