Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Distributing and using VERY large tables.
Message
 
À
14/08/2001 03:32:13
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Divers
Thread ID:
00539842
Message ID:
00543452
Vues:
15
Mihai,

I started working on writing a program that would compress the various fields that were not part of any index. I was going to use the Huffman encoding algorithm. To make it most efficient I was going to calculate a unique encoding tree for each field since each field would probably have a much different character distribution than other fields. I was also going to put the tree (needed for decoding) in the first record rather than have it embedded in the compressed data; again to save space.

I don’t know why I didn’t do the math up front but I was looking at one byte that I thought that I didn’t really need and did the math to see how much space it would take to use it anyway. 1K = 1024 bytes, 1 Meg = 1,048,576 bytes, 1 byte X 35,000,000 records = 35.3 Meg for just one byte. If I was trying to limit myself to 650 Meg to fit on a CD-ROM then, with 35 million records, that only allowed me something like 19 bytes per record. I realized then that compressing individual fields was not going to get me to my goal.

The software Compaxion, which compresses the whole file and allows you to read that file in compressed form, would probably have worked but the makers of Compaxion will not license it in the U.S. I asked them why but they won’t say.

In doing the research on compression I had looked into a variation of the Lempel-Ziv method called Lempel-Ziv Welch (LZW) and ran into this:

"LZW compression and decompression are licensed under Unisys Corporation's 1984 U.S. Patent 4,558,302 and equivalent foreign patents. This kind of patent isn't legal in most countries of the world (including the UK) except the USA. Patents in the UK can't describe algorithms or mathematical methods. "

I have a sneaky suspicion that Compaxion used Lempel-Ziv Welch and had to pull it off of the U.S. market when someone found out that they were breaking U.S. patent laws. The company that makes Compaxion is English and so they can keep selling it in Europe but just not in the U. S. At least this is my theory. Why else would they drop what is probably the biggest market and not explain why?

Anyway, I haven’t found a substitute for Compaxion yet and am not even sure if Compaxion would have worked since I have never actually seen the product in action.

If anybody hears of anything of has any other suggestions, please let me know.

Thanks,

Ed


>Maybe data in the others copmpanies tables is well organized . For example , strings like street names can be "compressed" to very short strings by using some "smart" functions, like those used for encryption and compression in archives . In cases where are many long character fields compression can be very strong . But this methods slows speed in working with records . If there are 3 fields with 30 length and they are reduced to 10 each, to 35 million records means (30-10)bytes*35000000*3=aprox. 2GBytes (!) less harddisk space occupied . Did I calculate correctly ?!!!
>If you saw files like d00, d01, maybe the other companies break data in many files. For example, every district has a different file and these names are memorised in a file to be known where to search data .
>
>Mihai
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform