Finding Duplicate Records Across 21 tables

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Finding Duplicate Records Across 21 tables

Message

From

02/11/1998 10:38:39

Tim Lawrence
PCC Airfoils, LLC
Broxton, Georgia, United States

02/11/1998 10:25:42

John Koziol
Ultimate Software
Miami, Florida, United States

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: Finding Duplicate Records Across 21 tables

Miscellaneous

Thread ID:

00153490

Message ID:

00153524

Views:

>Hiya Tim ---
>
>Are you checking for duplicates for each row in each table? If so, a simple change might speed things up considerably:
>
>For your "separate table", create an index on ABS(gross) and invoice (is invoice numeric, is so: TRANS(invoice,"9999999999")+TRANS(gross,"9999999.99") or something similar)....Before you check each row for duplicates, first SEEK in the "separate table" to see if a row matching that row's duplicate condition has already been logged. If so, you can skip checking. The amount of dupe checking table by table will decrease and performance will increase
>
>Note that this will only *if* you check all 21 tables for dupes for each specific row if the number of dupes is important to you.
>
>Another method would be to use a number of cascading UNION SELECTS, as Ed has suggested. But that may drag as VFP is limited in the number of UNIONS you can do in a single SQL statement and you have a lot of tables.
>
>Yet another is to use CREATE TABLE to create your "separate table" and then USE that table and APPEND FROM the other 21 tables, then create a unique index. Again, the question that needs to be addressed: Are you summarizing dupes or just need to know if a dupe exists?
>
>
>>I work for an auditing firm where we are looking for duplicate financial transactions that span approx 50M records over 21 separate tables. We are looking for matches based on, say, ABS(GROSS) and INVOICE across all 21 tables. Basically each record in each table has to be checked against the current table and each record in all the other tables. The results are complied into a separate table with a blank record between groupings. Currently this takes 72 hours. There's got to be a better way! Any suggestions?

We are currently using the first appoach of indexing (STR(GROSS,12,2)+INVOICE) and storing the first record of each in an array. We then start with the smallest value and check the array for a match. If a match is found, SELECTing the work area, SCATTER, SELECTing the resulting table, GATHER. If no match, increment the array variable for the current work area to the current record value, etc. Clear as mud? ;->

Duty is ours, the results are God's - John Quincy Adams

Map

View

Click here to load this message in the networking platform