Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Articles
Search: 

Compressing information with VFP
Roberto C. Ianni, December 1, 2004
Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks.
Summary
Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks.
Description

Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks. The original idea for this article came from Martín Salías; one day we were talking about the subject of my articles, and he said, "you might write a compressor with Visual FoxPro". But the real impulse came from Pablo van Diest, when he told me, "I don't think this can be done in FOX". Thus originated the idea which we are going to see today.

To carry out these tasks from VFP, we are accustomed to use, one way or another, WinZIP, WinRAR, or other compressors available on the market.

WinZip example:

oZip = CreateObject( "WScript.Shell" )
oZip.Run("winzip -a -r -p -ex C:\Archivo.Zip  c:\MyFile.xml", 0,.T.)

WinRar example:

RUN /N C:\Program Files\WinRAR\winrar a D:\Backup.rar D:\MyFile.xml

As a study challenge, I propose to construct our own data compressor, similar to the one used in WinZIP and WinRAR. This is quite interesting, and we will achieve an understanding of what, really, is a compressor. For this purpose, we will use the LZW, or Lempel-Zefv Welch, algorithm.

The LZW algorithm was developed by J. Ziv and A. Lempel (1977), and refined by T. Welch (1984), is patended by Unisys and, I believe, it is allowed to use the algorithm freely, although a license must be obtained for commercial applications. It became famous for its use in graphical format, GIF and TIFF, although it is capable of compressing any type of file.

The compression and decompression algorithms are really quite simple, which helped to make them widespread. The basis for compression is the reduction of redundant characters. For this purpose, what they do is save a table with strings and replace them by unique codes.

Compression

   Cadena = FREAD() && String
   DO WHILE( NOT FEOF() )
      Caracter = FREAD()
      IF( FindInList( Cadena + Caracter ) )
         Cadena = Cadena + Caracter
      ELSE
         NewCodeForChain()
         AddTable( Cadena + Caracter )

         WriteInFile()

         Cadena = Caracter
      ENDIF
   ENDDO

Now, let's suppose that we want to see how this algorithm works for compressing a text string. For this case study, we will take the string "ABBABABAC".

StringCharacterString + CharacterIndexCode
ABAB112165
BBBB112266
BABA110666
ABAB1121---
ABAABA1296256
ABAB1121---
ABAABA1296---
ABACABAC1331259
C   67

As a result of the data compression, we obtain the characters 65 66 66 256 259 67. For purposes of this small example, we can see that although we compressed the 9 characters "ABBABABAC" to six numerical characters, the size on disk for these six characters is 12 bytes, in other words, it is bigger than what we want to compress. However, for larger texts, the results are quite good.

Decompression

What is done for the decompression process is to build the table that was generated in the compression process as the compressed file is being traversed. The process will detect that the character read is compressed, because it is larger than 255, which is the maximum that can be represented in the ASCII code. In this case, what the process does is decompress it with the table that was built while traversing the file.

   ViejoCodigo = FREAD() && Old code
   WriteInFile()

   DO WHILE( NOT FEOF() )
      NuevoCodigo = FREAD() && New code

      lnString = DecodeString( NuevoCodigo )
      DO WHILE( ArrayDecodeStack[ PosDecodeStack ] )
         WriteInFile()
         PosDecodeStack = PosDecodeStack - 1
      ENDDO

      AddTable( ViejoCodigo, NuevoCodigo )

      ViejoCodigo = NuevoCodigo
   ENDDO

   PROCEURE DecodeString( lnCode )
      IF( lnCode >= 255 )
         DO WHILE( lnIndex > 255 )
            ArrayDecodeStack[ PosDecodeStack ] = ArrayHashCode[ lnCode 
]
            PosDecodeStack = PosDecodeStack + 1
            lnCode         = ArrayPrefCode[ lnCode ]
         ENDDO
      ELSE
         RETURN( lnCode )
      ENDIF
   ENDPROC

Old codeNew codeIndexesStringCharacterString + Character
6566256ABAB
6666257BBBB
6665258BABA
 256259ABAABA
25967 ABACABAC

What we see here is that the table was built as we traversed the file assigning indexes in the table, and when a character above 255 was found, it was decompressed with the table that was being build, thus obtaining, once again, the text "ABBABABAC".

Implementation

For the algorithm we have seen here, I implemented a class in VFP, with the very original name of "LZW". This class is in charge of compressing files and strings. Its use to compress files would be the following:

   SET PROCEDURE TO "lzw.prg"

   oLZW = CREATEOBJECT("LZW")
   oLZW.SetTypeCompress( 12 )

   oLZW.CompressFile( "MyFile.txt", "MyCompressedFile.txt" )
   oLZW.DeCompressFile( "MyCompressedFile.txt", "MyFileNEW.txt" )

   RELE oLZW

Compressing a text string

   SET PROCEDURE TO "lzw.prg"

   oLZW = CREATEOBJECT("LZW")
   oLZW.SetTypeCompress( 12 )

   lsComprimido    = oLZW.CompressString( "ABBABABAC" )
   lsDesComprimido = oLZW.DeCompressString( lsComprimido )

   RELE oLZW

Conclusion

As we can see, the compression and decompression algorithm is really very simple, and it is not worthwhile to explain much more, since it is easy to understand. I will leave the code open, so that you can use it, or simply see, in more details, the implementation in FOX; but the algorithm is based on what we saw so far.

Bear in mind that depending on the type of file you compress, the result might be bigger than the original. This happens, for example, in the case of files that are already compressed when you try to compress them. Note, also, that this compression algorithm is not as powerful as WinZIP or WinRAR; perhaps another day we will see how to implement them in VFP.

The algorithm might be perfected in some places, since I implemented it in such a way that it be easy to read. Besides, for compressing very large files, it will take a considerable time.

Source code

Roberto C. Ianni, Banco Hipotecario
Roberto Ianni (Buenos Aires, Argentina) is a professional software developer for the last 4 years. He has programmed in C++, VFP, VC++, C#, and has lots of experience in VFP and VC++. He currently works in the Banco Hipotecario as an Analyst-Developer.
More articles from this author
Roberto C. Ianni, June 1, 2004
The "EventLog" or "Event viewer" is something we use on a daily basis, to obtain information about what happened to our computer at a certain moment; for example, when an application generates an error, most of us will intuitively look at the EventLog to see what information it left for us.
Roberto C. Ianni, May 1, 2004
The "EventLog" or "Event viewer" is something we use every day, to obtain information about what happened to our computer at a certain moment. For example, when an application generates an error, most of us instinctively go to the EventLog to see what information it left for us.
Roberto C. Ianni, July 1, 2004
One of the great limitations I find in VFP is the lack of low-level manipulation I can obtain from it. For many, this might not be an inconvenience, but several times I have encountered a problem which can't be solved with Fox, and it is then that I use a DLL or an FLL.
Roberto C. Ianni, August 1, 2004
As an objective for this issue, I want to propose the following points, to complete the first part of the development of an FLL. Advanced development with an FLL; Accessing VFP data; Executing VFP commands from a FLL; MultiThreading; FLL with VFP 9.0; Executing Assembler from VFP.
Roberto C. Ianni, October 1, 2004
This article is the continuation of "Sending email through Visual FoxPro without additional components", published last March. Back when I wrote this article, I didn't think it would get such a huge response, but it did, hence, due to the mails coming from everywhere I decided to write again on that...
Roberto C. Ianni, November 1, 2004
In this second part we will learn how to: Implement MIME, Attach binary files, the final implementation of the WSendMail class, and the use of the WSendMail class.
Roberto C. Ianni, May 1, 2005
Windows Control Panel holds the configuration from several of the operating system's applications, and from some other applications that provide an applet to appear there. Learn how to build this kind of applets to give your application a professional configuration mechanism.
Roberto C. Ianni, April 1, 2005
The intention of this article is to finish with the use of POP3 (Post Office Protocol 3), implementing everything we have seen so far.
Roberto C. Ianni, January 1, 2005
This article is the beginning of receiving email from VFP; this is complemented with the articles you have already seen about sending email from VFP. To be able to do this, we will have to use the POP3 protocol (Post Office Protocol 3).
Roberto C. Ianni, February 1, 2005
We will continue some of the points mentioned in the previous article, in order to be able to implement message reception. The points to be developed are: POP3 Authentication (both Flat and MD5 authentication methods) and EMail parsing.
Roberto C. Ianni, March 1, 2004
Many of us send email using third-party tools such as Outlook Express or Microsoft Outlook, among others, and most likely have encountered registration problems in the client, licencing problems, OLE errors that only occurr at the client's site but not in our developer environment. Also problems if ...
Roberto C. Ianni, August 1, 2006
The present article is one of those that Martín Salías calls "low level fox", the general idea is to develop a service for the operative system whose only goal will be to run a VFP application in charge of handling the business logic.