Removing double records, math problem

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Removing double records, math problem

Message

15/06/2007 14:54:02

Edward Pikman
Independent
New York City, New York, États-Unis

15/06/2007 14:49:26

Lennert Blom
Hoevelaken, Pays-Bas

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: Removing double records, math problem

Versions des environnements

Visual FoxPro:

VFP 9 SP1

OS:

Windows XP SP2

Network:

Windows 2003 Server

Database:

Visual FoxPro

Divers

Thread ID:

01233421

Message ID:

01233607

Vues:

>>>>>>>If yes, you may have to calculate relative weight (or what's the right term? "удельный вес" in Russian) of each duplicate to the total number of records removed from each particular table...
>>>>>>>
>>>>>>>Anyway, let's see, if this is the exact problem you're trying to solve, because now I can see the complexity.
>>>>>>
>>>>>>Why complexity? You calculate number of duplicates that needs to be deleted, then you calculate the deletion number for each of 4 tables, and then you scan and delete in each table counting delete total until you get the number. Obviously, during deletion you should reduce the total duplicate counter to keep the last record intact.
>>>>>
>>>>>What if we also want to somehow keep the original table sizes proportions?
>>>>>
>>>>Yes, one must calculate total number of dupicates, then calculate how many 'duplicates' are in each table and then get the proportion/weight: i.e. delete 120 records in table1, 282 in table 2 etc.
>>>>
>>>>>I want to first hear the exact problem definition with some samples, then we can think of a solution.
>>>>
>>>>Absolutely.
>>>
>>>What makes it so complex is this: there will be a large number of different sets of duplicates. E.g. there are 129 sets of duplicates in table 1 and 2, 62 in table 1 and 3, 112 in 1 and 4, etc. with all permutations of 1,2,3 and 4. Because of that, once starting to delete duplicates in the end-result, you come to a situation where there are no records left to delete from table X, so you have to backtrack, deleting records with origin table X from sets where records from table X were left undeleted, thereby 'recalling' records from other tables which were deleted. This makes it very complex, especially since this is only an example of 4 tables, but I need an algorithm for T tables.
>>>
>>>To make it a little easier: there are no duplicates within one table.
>>>
>>>Hopes this makes it a bit more clear.
>>
>>You take it too literally. 'Duplicates' in one table means that you check records against the cursor that holds all duplicate records. If it's there then it could be deleted.
>>The deletion itself can proceed in a very straightforward way: as soon as you determined that 120 records should be deleted in Table1 then you just go, scan and delete first 120 records that qualify. Next you go to Table2 and do the same, etc.
>
>..until you com at the end of the process and because table 4 is the last there are not enough records left of table 4 to delete, because you will violate the rule there must be SUM(D1:Dt)-G records deleted. For table 1,2 and maybe 3 everything will go ok, but reaching table 4 you can't do what you want, because some records of table 4 tou will want to delete, but they are the last existing member of a small set of duplicates, so you can't. That's why you have to backtrack at that moment, but how...

It will happen (no records to delete in the last table) only if you delete all duplicate candidates. However, you firstly determine the number of records that should be deleted and then scan and delete until the number is reached. In other words, you don't delete all 'duplicates' in Table1 you leave some to be 'isolated' by deletion in other tables.
This way is 'brutal force' and it may give some inaccuracies in each specific case but in general random way results should be quite appropriate.

Edward Pikman
Independent Consultant

Répondre

Fil

Voir

Click here to load this message in the networking platform