Compressing information with VFP

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Articles

Search:

Compressing information with VFP

Roberto C. Ianni, December 1, 2004

Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks.

Summary

Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks.

Description

Today we are going to talk about data compression. This is something we do frequently, something which is useful for many different tasks. The original idea for this article came from Martín Salías; one day we were talking about the subject of my articles, and he said, "you might write a compressor with Visual FoxPro". But the real impulse came from Pablo van Diest, when he told me, "I don't think this can be done in FOX". Thus originated the idea which we are going to see today.

To carry out these tasks from VFP, we are accustomed to use, one way or another, WinZIP, WinRAR, or other compressors available on the market.

WinZip example:

oZip = CreateObject( "WScript.Shell" )
oZip.Run("winzip -a -r -p -ex C:\Archivo.Zip  c:\MyFile.xml", 0,.T.)

WinRar example:

RUN /N C:\Program Files\WinRAR\winrar a D:\Backup.rar D:\MyFile.xml

As a study challenge, I propose to construct our own data compressor, similar to the one used in WinZIP and WinRAR. This is quite interesting, and we will achieve an understanding of what, really, is a compressor. For this purpose, we will use the LZW, or Lempel-Zefv Welch, algorithm.

The LZW algorithm was developed by J. Ziv and A. Lempel (1977), and refined by T. Welch (1984), is patended by Unisys and, I believe, it is allowed to use the algorithm freely, although a license must be obtained for commercial applications. It became famous for its use in graphical format, GIF and TIFF, although it is capable of compressing any type of file.

The compression and decompression algorithms are really quite simple, which helped to make them widespread. The basis for compression is the reduction of redundant characters. For this purpose, what they do is save a table with strings and replace them by unique codes.

Compression

   Cadena = FREAD() && String
   DO WHILE( NOT FEOF() )
      Caracter = FREAD()
      IF( FindInList( Cadena + Caracter ) )
         Cadena = Cadena + Caracter
      ELSE
         NewCodeForChain()
         AddTable( Cadena + Caracter )

         WriteInFile()

         Cadena = Caracter
      ENDIF
   ENDDO

Now, let's suppose that we want to see how this algorithm works for compressing a text string. For this case study, we will take the string "ABBABABAC".

String	Character	String + Character	Index	Code
A	B	AB	1121	65
B	B	BB	1122	66
B	A	BA	1106	66
A	B	AB	1121	---
AB	A	ABA	1296	256
A	B	AB	1121	---
AB	A	ABA	1296	---
ABA	C	ABAC	1331	259
C				67

As a result of the data compression, we obtain the characters 65 66 66 256 259 67. For purposes of this small example, we can see that although we compressed the 9 characters "ABBABABAC" to six numerical characters, the size on disk for these six characters is 12 bytes, in other words, it is bigger than what we want to compress. However, for larger texts, the results are quite good.

Decompression

What is done for the decompression process is to build the table that was generated in the compression process as the compressed file is being traversed. The process will detect that the character read is compressed, because it is larger than 255, which is the maximum that can be represented in the ASCII code. In this case, what the process does is decompress it with the table that was built while traversing the file.

   ViejoCodigo = FREAD() && Old code
   WriteInFile()

   DO WHILE( NOT FEOF() )
      NuevoCodigo = FREAD() && New code

      lnString = DecodeString( NuevoCodigo )
      DO WHILE( ArrayDecodeStack[ PosDecodeStack ] )
         WriteInFile()
         PosDecodeStack = PosDecodeStack - 1
      ENDDO

      AddTable( ViejoCodigo, NuevoCodigo )

      ViejoCodigo = NuevoCodigo
   ENDDO

   PROCEURE DecodeString( lnCode )
      IF( lnCode >= 255 )
         DO WHILE( lnIndex > 255 )
            ArrayDecodeStack[ PosDecodeStack ] = ArrayHashCode[ lnCode 
]
            PosDecodeStack = PosDecodeStack + 1
            lnCode         = ArrayPrefCode[ lnCode ]
         ENDDO
      ELSE
         RETURN( lnCode )
      ENDIF
   ENDPROC

Old code	New code	Indexes	String	Character	String + Character
65	66	256	A	B	AB
66	66	257	B	B	BB
66	65	258	B	A	BA
	256	259	AB	A	ABA
259	67		ABA	C	ABAC

What we see here is that the table was built as we traversed the file assigning indexes in the table, and when a character above 255 was found, it was decompressed with the table that was being build, thus obtaining, once again, the text "ABBABABAC".

Implementation

For the algorithm we have seen here, I implemented a class in VFP, with the very original name of "LZW". This class is in charge of compressing files and strings. Its use to compress files would be the following:

   SET PROCEDURE TO "lzw.prg"

   oLZW = CREATEOBJECT("LZW")
   oLZW.SetTypeCompress( 12 )

   oLZW.CompressFile( "MyFile.txt", "MyCompressedFile.txt" )
   oLZW.DeCompressFile( "MyCompressedFile.txt", "MyFileNEW.txt" )

   RELE oLZW

Compressing a text string

   SET PROCEDURE TO "lzw.prg"

   oLZW = CREATEOBJECT("LZW")
   oLZW.SetTypeCompress( 12 )

   lsComprimido    = oLZW.CompressString( "ABBABABAC" )
   lsDesComprimido = oLZW.DeCompressString( lsComprimido )

   RELE oLZW

Conclusion

As we can see, the compression and decompression algorithm is really very simple, and it is not worthwhile to explain much more, since it is easy to understand. I will leave the code open, so that you can use it, or simply see, in more details, the implementation in FOX; but the algorithm is based on what we saw so far.

Bear in mind that depending on the type of file you compress, the result might be bigger than the original. This happens, for example, in the case of files that are already compressed when you try to compress them. Note, also, that this compression algorithm is not as powerful as WinZIP or WinRAR; perhaps another day we will see how to implement them in VFP.

The algorithm might be perfected in some places, since I implemented it in such a way that it be easy to read. Besides, for compressing very large files, it will take a considerable time.

Source code

Roberto C. Ianni, Banco Hipotecario
Roberto Ianni (Buenos Aires, Argentina) is a professional software developer for the last 4 years. He has programmed in C++, VFP, VC++, C#, and has lots of experience in VFP and VC++. He currently works in the Banco Hipotecario as an Analyst-Developer.

More articles from this author

Controlling the Event Log - Part 2

Roberto C. Ianni, June 1, 2004

The "EventLog" or "Event viewer" is something we use on a daily basis, to obtain information about what happened to our computer at a certain moment; for example, when an application generates an error, most of us will intuitively look at the EventLog to see what information it left for us.

Controlling the Event Log - Part 1

Roberto C. Ianni, May 1, 2004

The "EventLog" or "Event viewer" is something we use every day, to obtain information about what happened to our computer at a certain moment. For example, when an application generates an error, most of us instinctively go to the EventLog to see what information it left for us.

Developing FLL libraries

Roberto C. Ianni, July 1, 2004

One of the great limitations I find in VFP is the lack of low-level manipulation I can obtain from it. For many, this might not be an inconvenience, but several times I have encountered a problem which can't be solved with Fox, and it is then that I use a DLL or an FLL.

Developing FLL libraries - More power

Roberto C. Ianni, August 1, 2004

As an objective for this issue, I want to propose the following points, to complete the first part of the development of an FLL. Advanced development with an FLL; Accessing VFP data; Executing VFP commands from a FLL; MultiThreading; FLL with VFP 9.0; Executing Assembler from VFP.

Emails without additional components - The Resurrection - Part 1

Roberto C. Ianni, October 1, 2004

This article is the continuation of "Sending email through Visual FoxPro without additional components", published last March. Back when I wrote this article, I didn't think it would get such a huge response, but it did, hence, due to the mails coming from everywhere I decided to write again on that...

Emails without additional components - The Resurrection - Part 2

Roberto C. Ianni, November 1, 2004

In this second part we will learn how to: Implement MIME, Attach binary files, the final implementation of the WSendMail class, and the use of the WSendMail class.

Inside the Control Panel

Roberto C. Ianni, May 1, 2005

Windows Control Panel holds the configuration from several of the operating system's applications, and from some other applications that provide an applet to appear there. Learn how to build this kind of applets to give your application a professional configuration mechanism.

Reading email without additional components - Part III

Roberto C. Ianni, April 1, 2005

The intention of this article is to finish with the use of POP3 (Post Office Protocol 3), implementing everything we have seen so far.

Reading emails from VFP without additional components - Part I

Roberto C. Ianni, January 1, 2005

This article is the beginning of receiving email from VFP; this is complemented with the articles you have already seen about sending email from VFP. To be able to do this, we will have to use the POP3 protocol (Post Office Protocol 3).

Reading emails from VFP without additional components - Part II

Roberto C. Ianni, February 1, 2005

We will continue some of the points mentioned in the previous article, in order to be able to implement message reception. The points to be developed are: POP3 Authentication (both Flat and MD5 authentication methods) and EMail parsing.

Sending email through Visual FoxPro without additional components

Roberto C. Ianni, March 1, 2004

Many of us send email using third-party tools such as Outlook Express or Microsoft Outlook, among others, and most likely have encountered registration problems in the client, licencing problems, OLE errors that only occurr at the client's site but not in our developer environment. Also problems if ...

Windows Services with VFP

Roberto C. Ianni, August 1, 2006

The present article is one of those that Martín Salías calls "low level fox", the general idea is to develop a service for the operative system whose only goal will be to run a VFP application in charge of handling the business logic.