Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Putting pictures in a zip file
Message
General information
Forum:
Visual FoxPro
Category:
Third party products
Miscellaneous
Thread ID:
00466323
Message ID:
00467731
Views:
21
>Will zipping give you any compression? If images are gif or jpg will it be worth while?
>

Zipping will pay off with small images even if there's little or no compression, thanks to the wonders of sub-cluster allocation. The minimum unit of allocation of disk space for a file is one cluster; with Win32, that's 2K for relatively small volumes of 8GB or less; Win16 is far more wasteful, with a 2GB volume using a 32K cluster size. If each image averages 500 bytes, saving them as distinct files always allocates a full cluster - the leftover space, called 'slack space', is wasted and unusable. A Zip file is also allocated in clusters, but files do not need to be allocated on cluster boundaries; there's some overhead for an internal directory structure for the Zip, but the slack space per file is recovered, as well as any gains in stored image size due to direct compression of the image.

NTFS uses a far smaller cluster size, and can have native compression enabled on a folder-by-folder basis, which can provide sub-cluster allocation and OS-managed compression, although not as strong (or expensive in terms of processing time) as the LZH mechanism used by most Zip file structures. Enabling folder compression offers many of the payoffs of Zipping at far less total expense, as long as the files under folder compression are relatively static; files subject to changes in size wreak havoc on sub-cluster allocation, and cause all kinds of bizarre fragmentation overhead in terms of rewriting data, and recovery of fragmentation in a subcluster-allocated folder is painfully slow, requiring that the folder get copied to a new area, rearranging the relative position of subcluster allocation to avoid issues of cross-cluster I/O overhead for subcluster-sized files. I've written algorithms to do this kind of rearrangement, to reorder the layout of the Link Pack Area (the library of common system calls resident in reserved memory used by IBM's OS/VS and DOS/VS family of mainframe operating systems) long ago and far away, and the mathematics of optimal LPA management is not trivial; it introduced me to the mathematics of knapsack algorithms; the mathematical arrangement of spatial objects to optimize for size and speed of access.

The use of NTFS folder compression for a folder that contains static image files is often a very low overhead alternative to the process of zipping and unzipping images before display. Allocating a small logical partition under FAT and using DriveSpace3 on it, devoting the volume to the same use, offers many of the same advantages of NTFS compression, although my experience is that DriveSpace3 compression is not as fast or robust as NTFS folder compression, it does often yield better reduction of space from compression itself. I tend not to worry about this - disk space is cheap; NTFS handles a folder with a whole lot of file entries much faster than a pudgy FAT partition, and directory search times and random access of files stored under NTFS tremendously exceed the capabilities of FAT files systems.

NTFS offers a number of advantages. It is capable of finer granularity of access control, such that each file entry may have it's own access control limits. The directory is implemented as an index, so searching it runs faster where the folder contains many files. The file access table is a doubly-linked list, which references the block preceding and following it in the file, unlike FAT, which is a forward-linked linked list. This gives better random access performance, especially where processings big files demands moving the file pointer backwards; NTFS allows traversal of the file forwards and backwards relative to the current reference point (to move backwards, a FAT file has to go to the head of the file allocation chain and walk forward until it finds the required cluster; NTFS can step back from it's present location, or start at the head if obviously nearer to it. Relative movement within the file is handled identically whether going forward or back through the file.) NTFS file allocation recording is also considerably more fault-tolerant - if a forward link is corrupted, where a FAT file becomes cross-linked or chopped short and leaves lost clusters on the volume, an NTFS file allocation index can use the back link starting at the tail of the file to rebuild the forward linked list; and even where both link lists are damaged, it may be possible to repair one damaged node of a link with an undamaged fragment of the inverse list, and then use the repaired list to rebuild the inverse index if the damage doesn't overlap. It also offers options for creating a single logical volume from multiple physical volumes, both by striping and mirroring (RAID) and by volume extension, connecting two volumes together as a single logical disk, without striping data, simply treating the clusters of both volumes as resources of a single volume - a way to add a new disk to your system so that it appears as if both drives are one unit. It's not suited to high-reliability storage (striping with parity, AKA RAID5, builds in fault tolerances and delivers higher disk bandwidth) but it's an easy way to make a single big temporary work volume out of several small disks, or remaindered space on not quite identical drives used in a RAID array ( eg I have an array that's built from two 9.1GB Quantum drives and an 8.4GB Maxtor; the excess .9GB on the two Quantums is used as a 1.8GB volume set, rather than as two .9GB volumes, which just happens to be big enough to hold my MSDN Library data...)

NTFS offers compression, encryption under Win2K via EFS, and uses the smallest disk allocation unit size for a given volume size. It also handles the extremely large volume sizes supported by the Windows family of operating systems; several vendors have delivered large RAID arrays in excess of a TB (1024GB, or 1,099,511,627,776 bytes) as a single NTFS logical disk (the one I've actually seen was built around two Fiber Channel HAs handling a 64 drive array of 18GB Seagate Cheetahs (63*18GB + a parity drive) from CSC.) Both NT Server and the various Win2K Servers have support for large arrays delivering fault tolerance and blindingly fast disk bandwidth built in, using either specialized RAID adapters or through software striping.

FAT's sole advantages are slightly better speed doing straight, forward-directed sequential access of a large, unfragmented file, less overhead for storing file allocation tables, and searching small directories that have no deleted file records in them is fast and simple. They are also simpler to verify that they are intact and functional. The drawbacks are greater degradation of the directory search as the number of files expands, less fault-tolerance, and slower performance during random access requiring backwards movement of the file pointer. They don't have the granularity of access control of NTFS, some Win2K features require NTFS to implement, and they are not generally secure; NTFS tries to enforce security. If I can stick a DOS boot disk in a machine, I can access a FAT volume; at least with NTFS, a few tools like Winternal's ERD Commander Pro, which can boot NT/2K to a console Window and mess with access rights, permissions and passwords, or their NTFSDOS product, which can read an NTFS volume under Win9x through a specialized driver, are needed.

I'd seriously consider using a compressed NTFS folder for the images, which really simplifies the storage process without writing any specialized code or investigating what issues the zip format raises - simultaneous access, limits on maximum file count, and exactly what range of damage could result if part of the zip file is corrupted.
EMail: EdR@edrauh.com
"See, the sun is going down..."
"No, the horizon is moving up!"
- Firesign Theater


NT and Win2K FAQ .. cWashington WSH/ADSI/WMI site
MS WSH site ........... WSH FAQ Site
Wrox Press .............. Win32 Scripting Journal
eSolutions Services, LLC

The Surgeon General has determined that prolonged exposure to the Windows Script Host may be addictive to laboratory mice and codemonkeys
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform