Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Finding Duplicate Record
Message
Information générale
Forum:
Visual FoxPro
Catégorie:
Autre
Divers
Thread ID:
00061150
Message ID:
00061179
Vues:
53
>>>>>What is the best method to identify duplicate records in a table?
>>>>>In this case there is approx 1.5 million records and i need to look at
>>>>>each field.
>>>>>
>>>>>your help will be appreciated.
>>>>
>>>>I dont know if this is the best way, but I programmatically check my tables for duplicate records (meaning all fields the same in more than 1 record) with SELECT DISTINCT * FROM MYTABLE into MYARRAY and compare RECCOUNT() OF original table to _TALLY. If _TALLY is less then I ZAP original table and APPEND FROM MYARRAY. HTH
>>>
>>>Could you try to Select ~1.5 million records to array?
>>
>>Well my above syntax needs more coffee ;)
>>SELECT DISTINCT * FROM handle2 INTO ARRAY checkdup
>>
>>handle2 has 1.3 million records with no duplicates. I ran this on my database server(dual 200Mz pentium with 132 MB of memory) and got 1307962 records selected in 1387.2 minutes. (and 3 users screaming the network locked up :))
>>
>>Yes, this code is a resource hog... thats why I said I didnt know if it was the best way. I use this in a UDF I've setup as a maintenance process run directly on the server for my data warehouse, and I typically only run it overnight. With larger tables (like 2mil +) I sometimes get SQL memory error with it.
>>
>>How would you handle this type of thing Edward?
>
>Firstly, I am really surprised that you could run this at all, because it's beyond VFP capacity (number of array elements).
>How would I handle the problem? I cannot recall, do we have any limit on number of grouping fields, it might be the way. However, in general, the best way is to have normalized primary_keyed database that will effectively prevent so crazy problems, especially assuming that this is high-volume data.

I didnt try VFP... unfortunately my server is still FPW 2.6. I guess the max array size is why I sometimes get error. But now I have to wonder why I have not have bigger troubles with this function, I hope it isnt because FPW 2.6 would chop off my array at the max number. Looks like I got some checking do (sigh)

In my situation, I have to do this kind of thing because data entry and transaction processing is handled for us by Service Provider with Unix System that does not give me data that can be easily normalized. I often check for duplicates in order to generate an Error Reports for users to go into Unix System and fix their mistakes.
Roxanne M. Seibert
Independent Consultant, VFP MCP

Code Monkey Like Fritos
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform