Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks. The original idea for this article came from Martín Salías; one day we were talking about the subject of my articles, and he said, "you might write a compressor with Visual FoxPro". But the real impulse came from Pablo van Diest, when he told me, "I don't think this can be done in FOX". Thus originated the idea which we are going to see today.
To carry out these tasks from VFP, we are accustomed to use, one way or another, WinZIP, WinRAR, or other compressors available on the market.
WinZip example:
oZip = CreateObject( "WScript.Shell" ) oZip.Run("winzip -a -r -p -ex C:\Archivo.Zip c:\MyFile.xml", 0,.T.)
WinRar example:
RUN /N C:\Program Files\WinRAR\winrar a D:\Backup.rar D:\MyFile.xml
As a study challenge, I propose to construct our own data compressor, similar to the one used in WinZIP and WinRAR. This is quite interesting, and we will achieve an understanding of what, really, is a compressor. For this purpose, we will use the LZW, or Lempel-Zefv Welch, algorithm.
The LZW algorithm was developed by J. Ziv and A. Lempel (1977), and refined by T. Welch (1984), is patended by Unisys and, I believe, it is allowed to use the algorithm freely, although a license must be obtained for commercial applications. It became famous for its use in graphical format, GIF and TIFF, although it is capable of compressing any type of file.
The compression and decompression algorithms are really quite simple, which helped to make them widespread. The basis for compression is the reduction of redundant characters. For this purpose, what they do is save a table with strings and replace them by unique codes.
Compression
Cadena = FREAD() && String DO WHILE( NOT FEOF() ) Caracter = FREAD() IF( FindInList( Cadena + Caracter ) ) Cadena = Cadena + Caracter ELSE NewCodeForChain() AddTable( Cadena + Caracter ) WriteInFile() Cadena = Caracter ENDIF ENDDO
Now, let's suppose that we want to see how this algorithm works for compressing a text string. For this case study, we will take the string "ABBABABAC".
As a result of the data compression, we obtain the characters 65 66 66 256 259 67. For purposes of this small example, we can see that although we compressed the 9 characters "ABBABABAC" to six numerical characters, the size on disk for these six characters is 12 bytes, in other words, it is bigger than what we want to compress. However, for larger texts, the results are quite good.
Decompression
What is done for the decompression process is to build the table that was generated in the compression process as the compressed file is being traversed. The process will detect that the character read is compressed, because it is larger than 255, which is the maximum that can be represented in the ASCII code. In this case, what the process does is decompress it with the table that was built while traversing the file.
ViejoCodigo = FREAD() && Old code WriteInFile() DO WHILE( NOT FEOF() ) NuevoCodigo = FREAD() && New code lnString = DecodeString( NuevoCodigo ) DO WHILE( ArrayDecodeStack[ PosDecodeStack ] ) WriteInFile() PosDecodeStack = PosDecodeStack - 1 ENDDO AddTable( ViejoCodigo, NuevoCodigo ) ViejoCodigo = NuevoCodigo ENDDO PROCEURE DecodeString( lnCode ) IF( lnCode >= 255 ) DO WHILE( lnIndex > 255 ) ArrayDecodeStack[ PosDecodeStack ] = ArrayHashCode[ lnCode ] PosDecodeStack = PosDecodeStack + 1 lnCode = ArrayPrefCode[ lnCode ] ENDDO ELSE RETURN( lnCode ) ENDIF ENDPROC
What we see here is that the table was built as we traversed the file assigning indexes in the table, and when a character above 255 was found, it was decompressed with the table that was being build, thus obtaining, once again, the text "ABBABABAC".
Implementation
For the algorithm we have seen here, I implemented a class in VFP, with the very original name of "LZW". This class is in charge of compressing files and strings. Its use to compress files would be the following:
SET PROCEDURE TO "lzw.prg" oLZW = CREATEOBJECT("LZW") oLZW.SetTypeCompress( 12 ) oLZW.CompressFile( "MyFile.txt", "MyCompressedFile.txt" ) oLZW.DeCompressFile( "MyCompressedFile.txt", "MyFileNEW.txt" ) RELE oLZW
Compressing a text string
SET PROCEDURE TO "lzw.prg" oLZW = CREATEOBJECT("LZW") oLZW.SetTypeCompress( 12 ) lsComprimido = oLZW.CompressString( "ABBABABAC" ) lsDesComprimido = oLZW.DeCompressString( lsComprimido ) RELE oLZW
Conclusion
As we can see, the compression and decompression algorithm is really very simple, and it is not worthwhile to explain much more, since it is easy to understand. I will leave the code open, so that you can use it, or simply see, in more details, the implementation in FOX; but the algorithm is based on what we saw so far.
Bear in mind that depending on the type of file you compress, the result might be bigger than the original. This happens, for example, in the case of files that are already compressed when you try to compress them. Note, also, that this compression algorithm is not as powerful as WinZIP or WinRAR; perhaps another day we will see how to implement them in VFP.
The algorithm might be perfected in some places, since I implemented it in such a way that it be easy to read. Besides, for compressing very large files, it will take a considerable time.
Source code