Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Removing double records, math problem
Message
De
15/06/2007 09:57:51
 
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Versions des environnements
Visual FoxPro:
VFP 9 SP1
OS:
Windows XP SP2
Network:
Windows 2003 Server
Database:
Visual FoxPro
Divers
Thread ID:
01233421
Message ID:
01233466
Vues:
10
Edward also made it clear to me that I am didn't describe that aspect clear enough. Rreading my text, I understand that.

The resulting table should be a concatenation of the 4 tables, with all duplicates removed, but that should happen in an evenly distributed way, like the numerical examples displayed.

>Hi Lennert
>
>What I cannot understand from your description is what the final table(s)
>should contain. Should it contain distinct records only? Or only unique records from tables, and duplicated records should be removed? Or the duplicated records also should be there, but in proportion? Could you give the example of what you have and what you expect?
>
>
>>Dear all,
>>
>>I have a complex question. My first task is to make it sufficiently clear to all of you. After that, maybe you can help me come with a solution.
>>
>>First I'll describe in a simple way the problem, then I'll describe it in a more mathematical manner.
>>
>>Let's assume I have 4 tables, each with a different number of records in them, but always at least a few thousand. Each table contains persons: name and address, per table they are unique. The program has the task to make 1 output table by just concatenating the 4 tables, but with all doubles removed. Alas, the way they should be removed is not so easy. As much as each table contibutes to the total number of doubles, so much should the number of deleted records be per table. Example:
>>
>>Table #records #doubles perc
>> 1 10000 300 3%
>> 2 6000 900 15%
>> 3 4000 200 5%
>> 4 12000 720 6%
>>
>>Because some of the persons come not only in 2 but in 3 of even 4 tables, we have somewhere between 900 and 1060 groups of double records. Of course, of each group of double records, 1 record should remain. The # of deleted recs should be evenly distributed.
>>
>>Let's assume all records which are double are already marked with a group number. How can I design a deletion algorithm that all tables are evenly 'touched' by deletion? So the outcome should be, in case of exactly 1000 groups of doubles:
>>
>> (300+900+200+720)-1000 = 1120 deletions, distributed in the ratios 3 to 15 to 5 to 6, so:
>>
>>Table #records #doubles perc ratio_in_perc #deletions
>> 1 10000 300 3% 3/29= 10.34% 116
>> 2 6000 900 15% 15/29= 51.72% 579
>> 3 4000 200 5% 5/29= 17.24% 193
>> 4 12000 720 6% 6/29= 20.69% 232
>>
>>Now the mathematical description:
>>
>>There are T tables, each containing N1,N2,N3...Nt unique records. Each table contains, compared to the other T-1 tables, a certain number of double records: D1,D2,... Dt. These doubles form together G groups. So SUM(D1:Dt)-G records should be deleted evenly across the tables.
>>
>>Dear foxpro-gurus, set theory-hackers, math-wizards and all other folks: How?????
>>
>>I know, I know, it's friday, and you've been on a diet of SP's this week on the UT :-). But maybe you can sill squeeze out some synapse-activity.
>>
>>Lennert
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform