Fragmentation - why it is usually good for VFP native da

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Fragmentation - why it is usually good for VFP native da

Message

From

19/01/2003 18:33:58

Jim Nelson
Toronto, Ontario, Canada

19/01/2003 15:58:45

Al Doman (Online)
M3 Enterprises Inc.
North Vancouver, British Columbia, Canada

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Fragmentation - why it is usually good for VFP native da

Miscellaneous

Thread ID:

00742043

Message ID:

00743318

Views:

Al,

>>My main objective is to change the perception that fragmentation is always bad. My reason for doing so is that I then hope that MS orsome software house can be convinced to make tools to let us do something USEFUL **WITH** fragmentation, rather than always AGAINST fragmentation.
>
>Hi Jim,
>
>The impression I've gathered from these threads is that there are some circumstances in which fragmentation may help VFP file I/O performance. However, I think it's a bit of a stretch to claim it *usually* improves it.

Well it *is* a tough call, for sure. But that reason that I state it that way is because:
1) no matter what you do, the next write to a shared VFP table will result in fragmentation;
2) modern HDs are considerably different than HDs of even just a few years ago, especially as regards sectors per track;
3) FPTs appear to be 'permanently' fragmented (unless extraordinary non-DML functions (FILETOSTR()/STRTOFILE()) or non-VFP utilities are used) yet, for other than size issues caused by bloat, they have never been singled out as a problem.
ALL of these factors tell me that it is more accurate to say that fragmentation is "usually good" rather than 'sometimes good''.

>
>If, for example, you have 2 or more separate VFP databases on the same disk, or those additional databases plus other non-VFP apps/data, then ongoing disk fragmentation is not likely to be as "optimal" as necessary to see improvements.

EXACTLY, and that's precisely why I want a movement for improvement in managing fragmentation rather than just eliminating it.

>
>As for someone creating a "fragmentation" tool (dev. code name: "Grenade"? ;-)) I don't think this will happen, for the reason I outline in the next paragraph. Continuing with the multi-app scenario outlined above, how can you be sure, if you "optimally fragment" App#1, that you haven't hurt the other apps - i.e. take from Peter to pay Paul?
>
>Historically, to wring every last iota of performance out of the file system, RDBMS vendors have implemented their databases as single files, in pre-allocated, contiguous disk space. The DBMS is then free to manage the placement of DB objects within this space however it wants - it knows the OS can't move file fragments outside the pre-allocated range. There are at least 2 major downsides - it adds another layer of complexity in DB object management, and it could be quite painful to expand the size of a database on a crowded disk or volume. Also, if that single file gets corrupted, those are all your eggs in that single basket (have you ever talked to anyone who's suffered a corrupted Access .MDB?) This is the way Sybase and the early (Sybase-based) versions of SQL Server used to work. I don't know if that's still the case with SQL Server, or its major competitors.

A valid point - if accurate. I suspect that SQL-Server may well 'work' that glob of space in particular ways that include fragmentation considerations. But I seriously doubt that MS ACCESS does anything of the sort (especially if it was revised for Rushmore later in its evolution). But who knows for sure!?

>
>I'd suggest that in the specialized cases where vendors could use the extra performance fragmentation may give them in specific circumstances, they will probably implement this monolithic, single-file approach which ensures they can organize their DB objects optimally, while not treading on anyone else's file/data structures.

Well I don't think it has to be that way, and here are some reasons why:

Simplest of all... if people realize that fragmentation can be good in VFP and like applications (apps that typically have dozens of files open simultaneously, share them across a network, and allow (new) record insertions from all of those sharers to any of those files at the same time) then they might spend the little extra money it costs these days to place their VFP tables/files on a separate HD designated for such usage exclusively. That itself could easily take away the argument regarding other applications. I would also expect, in such a case, that TEMPFILES, EDITWORK and SORTWORK would continue to be specified to be allocated AWAY from this drive (left on C:, most likely). All this possibly for the price of a HD ($80 - $150 approx.) and possibly a cable.
Of course they should also revise any "file maintenance" programs to concentrate more on external backups than on PACK and/or manually rebuilding indexes. Fortunately, I have learned that PACK MEMO does not copy the FPT it works on, but simply re-writes within the existing allocation, possibly eliminating some clusters at the end of the FPT.

But let's look a a possible TOOL and the "rob Peter to pay Paul" possibility.
We already have an OS facility that will start a program based on file name extensions.
It seems logical that the (NTFS) file system could offer a similar facility regarding the "generic" allocation of clusters depending on the file extension, letting us (the user) specify a set of parameters as we see fit.
The parameters I have in mind for each extension are (off the top of my head):
1) relative placement dependent on placement of the last fragment of the specific file.
- I think a simple yes/no switch is all that's needed here, "yes" to say it matters and no to say it doesn't.
2) maximum number of clusters to write to the file for this specific WRITE command before leaving a specified number of clusters free.
3) Number of clusters to leave free between cluster allocations for WRITEs (#2 above).
So if a file (DBF, say) was written to in a "bulk" for 10,000 records (let's say 2,500 clusters) then it would, based on a "YES"/5/3 for the above parameters:
1) finish off the last allocated cluster to fill it;
2) if the 3 (per param#3) clusters beyond the last used (and just filled) cluster are free, ensure that the search for close free clusters proceeds for clusters after the third free one (just take next next closer one if 3 are not free).
3) Allocate 5 clusters (param#2), then WRITE data into those 5 clusters, then skip 3 clusters (param#3) and repeat this until all of the clusters that need to be written have been written.
Admittedly this is over-simplified, but I hope you get the idea.

A change to NTFS to have user-selectable "allocation zones" could allow addition of a 4th parameter to the ones described above. Let's say that it permitted up to five "zones" to be sized by the user, each zone simply stating what percentage of the HD is to be "allocated" (if any) for each zone. Zone 1 might be the first nn% of the drive, zone 2 xx% of the drive...all the way to zone 5. These zones would be from the inner hub towards the outer edge. So zone 1 could be set to 25%, zone 2 to 0%, zone 3 to 0%, zone 4 to 0% and zone 5 to 75%.
Then we could have the 4th parameter above simply specify which zone was to be used for the file of extension DBF.

Finally, the Disk Defragmenter utility could respect and act on these factors too, possibly even being used to reflect the result of changing them if that happened prior to running it.

The important point is that to have any chance at all of managing fragmentation towards optimization we have to have some way to do so. I hope that general recognition that fragmentation is not always bad can lead people to request facilities to exploit it.

There may well be another good reason for MS to give users some control over this. If you 'search' MSDN for "Configuring disk alignment" you will read a short paragraph that includes: "This characteristic of the MBR causes the default starting sector for disks that report more than 63 sectors per track to be the 64th sector. As a result, when programs transfer data to or from disks that have more than 63 sectors per track, misalignment can occur at the track level, with allocations beginning at a sector other than the starting sector. This misalignment can defeat system optimizations of I/O operations designed to avoid crossing track boundaries.". Since sectors per track on modern HDs can approach 1,000, this attempt to optimize is not serving us too well anyway. So some other mechanism may be a wise thing.

Anxious to hear your comments.

Map

View

Click here to load this message in the networking platform