Information générale
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Titre:
Removing double records, math problem
Versions des environnements
Network:
Windows 2003 Server
Dear all,
I have a complex question. My first task is to make it sufficiently clear to all of you. After that, maybe you can help me come with a solution.
First I'll describe in a simple way the problem, then I'll describe it in a more mathematical manner.
Let's assume I have 4 tables, each with a different number of records in them, but always at least a few thousand. Each table contains persons: name and address, per table they are unique. The program has the task to make 1 output table by just concatenating the 4 tables, but with all doubles removed. Alas, the way they should be removed is not so easy. As much as each table contibutes to the total number of doubles, so much should the number of deleted records be per table. Example:
Table #records #doubles perc
1 10000 300 3%
2 6000 900 15%
3 4000 200 5%
4 12000 720 6%
Because some of the persons come not only in 2 but in 3 of even 4 tables, we have somewhere between 900 and 1060 groups of double records. Of course, of each group of double records, 1 record should remain. The # of deleted recs should be evenly distributed.
Let's assume all records which are double are already marked with a group number. How can I design a deletion algorithm that all tables are evenly 'touched' by deletion? So the outcome should be, in case of exactly 1000 groups of doubles:
(300+900+200+720)-1000 = 1120 deletions, distributed in the ratios 3 to 15 to 5 to 6, so:
Table #records #doubles perc ratio_in_perc #deletions
1 10000 300 3% 3/29= 10.34% 116
2 6000 900 15% 15/29= 51.72% 579
3 4000 200 5% 5/29= 17.24% 193
4 12000 720 6% 6/29= 20.69% 232
Now the mathematical description:
There are T tables, each containing N1,N2,N3...Nt unique records. Each table contains, compared to the other T-1 tables, a certain number of double records: D1,D2,... Dt. These doubles form together G groups. So SUM(D1:Dt)-G records should be deleted evenly across the tables.
Dear foxpro-gurus, set theory-hackers, math-wizards and all other folks: How?????
I know, I know, it's friday, and you've been on a diet of SP's this week on the UT :-). But maybe you can sill squeeze out some synapse-activity.
Lennert
Suivant
Répondre
Voir le fil de ce thread
Voir le fil de ce thread à partir de ce message seulement
Voir tous les messages de ce thread
Voir tous les messages de ce thread à partir de ce message seulement