Removing double records, math problem

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Removing double records, math problem

Message

15/06/2007 09:57:51

Lennert Blom
Hoevelaken, Pays-Bas

15/06/2007 09:08:47

Yuri Rubinov
Philadelphia, Pennsylvanie, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: Removing double records, math problem

Versions des environnements

Visual FoxPro:

VFP 9 SP1

OS:

Windows XP SP2

Network:

Windows 2003 Server

Database:

Visual FoxPro

Divers

Thread ID:

01233421

Message ID:

01233466

Vues:

Edward also made it clear to me that I am didn't describe that aspect clear enough. Rreading my text, I understand that.

The resulting table should be a concatenation of the 4 tables, with all duplicates removed, but that should happen in an evenly distributed way, like the numerical examples displayed.

>Hi Lennert
>
>What I cannot understand from your description is what the final table(s)
>should contain. Should it contain distinct records only? Or only unique records from tables, and duplicated records should be removed? Or the duplicated records also should be there, but in proportion? Could you give the example of what you have and what you expect?
>
>
>>Dear all,
>>
>>I have a complex question. My first task is to make it sufficiently clear to all of you. After that, maybe you can help me come with a solution.
>>
>>First I'll describe in a simple way the problem, then I'll describe it in a more mathematical manner.
>>
>>Let's assume I have 4 tables, each with a different number of records in them, but always at least a few thousand. Each table contains persons: name and address, per table they are unique. The program has the task to make 1 output table by just concatenating the 4 tables, but with all doubles removed. Alas, the way they should be removed is not so easy. As much as each table contibutes to the total number of doubles, so much should the number of deleted records be per table. Example:
>>
>>Table #records #doubles perc
>> 1 10000 300 3%
>> 2 6000 900 15%
>> 3 4000 200 5%
>> 4 12000 720 6%
>>
>>Because some of the persons come not only in 2 but in 3 of even 4 tables, we have somewhere between 900 and 1060 groups of double records. Of course, of each group of double records, 1 record should remain. The # of deleted recs should be evenly distributed.
>>
>>Let's assume all records which are double are already marked with a group number. How can I design a deletion algorithm that all tables are evenly 'touched' by deletion? So the outcome should be, in case of exactly 1000 groups of doubles:
>>
>> (300+900+200+720)-1000 = 1120 deletions, distributed in the ratios 3 to 15 to 5 to 6, so:
>>
>>Table #records #doubles perc ratio_in_perc #deletions
>> 1 10000 300 3% 3/29= 10.34% 116
>> 2 6000 900 15% 15/29= 51.72% 579
>> 3 4000 200 5% 5/29= 17.24% 193
>> 4 12000 720 6% 6/29= 20.69% 232
>>
>>Now the mathematical description:
>>
>>There are T tables, each containing N1,N2,N3...Nt unique records. Each table contains, compared to the other T-1 tables, a certain number of double records: D1,D2,... Dt. These doubles form together G groups. So SUM(D1:Dt)-G records should be deleted evenly across the tables.
>>
>>Dear foxpro-gurus, set theory-hackers, math-wizards and all other folks: How?????
>>
>>I know, I know, it's friday, and you've been on a diet of SP's this week on the UT :-). But maybe you can sill squeeze out some synapse-activity.
>>
>>Lennert

Répondre

Fil

Voir

Click here to load this message in the networking platform