Fragmentation - a simple benchmark

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Fragmentation - a simple benchmark

Message

From

02/01/2003 19:23:35

Jim Nelson
Toronto, Ontario, Canada

All

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Fragmentation - a simple benchmark

Miscellaneous

Thread ID:

00737567

Message ID:

00737567

Views:

I spent a day reading/digesting the link Al Doman provided (http://www.pcguide.com/ref/hdd/) and a few other articles from a Google search. The following is what I have been able to glean from all of this reading (concentrating on the NTFS facility):

1) While it is often said that there is no fragmentation in NTFS, that is simply not true.

2) While it has been said that NTFS (5.0, latest in Win2000) attempts to limit fragmentation, the best I could find was that if file data can fit inside the "MFT" control area for a file within NTFS space itself, it will be stored there.
--a) There are likely lots of little files that fit in there nicely.
--b) Possibly, too, the "Cluster Allocation Bitmap" (record #6 of the "MFT") figures in reducing fragmentation compared to the FAT file system.

3) NTFS (5.0) will use a maximum (default?) cluster size of 4096 bytes regardless of hard disk capacity.
--a) It can use less in special circumstances (conversion from FAT32 was said to always create clusters at 512 bytes).
--b) It appeared that clusters larger than 4096 bytes could be created using parameters to the FORMAT command.

4) NTFS uses a standard sector size of 512 bytes. So, a 4096 byte cluster is comprised of 8 sectors, all contiguous.
-- a) Modern hard disks can have in the range of 40 to 82 (or more) clusters per track (assuming standard 4096 bytes clusters).
---- i) That is between 320 to 660 sectors and this would be the number of clusters too if the cluster size was 512 bytes (as my DELL notebook is - I think I converted it to NTFS when first started).

5) File space is always allocated in whole clusters. The cluster is the minimum allocation unit in NTFS (as it is in FAT).

6) The system always tries to allocate a file in contiguous space. Of course, to be able to do this, the file's size must be known when trying to write it (and applications like Word or Excel can provide this information).
--a) This, of course, is meaningless for a database table because its size is limited to "header" records only when the file is first created.
---- i) It is the same situation for the structural .CDX and the .FPT (when present). The .CDX is allocated on creation of the first index and a .FPT at the same time as the .DBF.
--b) It is only when contiguous space cannot be found that non-contiguous clusters will be used.
---- i) I could find nothing to suggest that any information gleaned while searching for all-contiguous space is save/used for this fragments allocation. For instance, nothing said that the largest available cluster(s) was noted for use if the search failed.
---- ii) I could find nothing that suggested any other kind of consideration for optimization when the decision to resort to non-contiguous clusters was determined.
---- iii) I suspect that NTFS uses the "Cluster Allocation Bitmap" in this process and that it has some logic to 'optimize' fragmentation.

7) NTFS (5.0) reserves 12.5% of available space to do its thing, and this can grow as needed and/or as additional of its features are implemented.
--a) "Reserved" is a key word here. It can be used to store data if need be, but the areas is the last to be considered for such use.
--b) NTFS maintains a "mirror" of of the first 16 records of its "MFT". In NT 3.5 and earlier it is stored in the middle of the partition and later versions store it at the end of the partition.

8) Standard defragmentation programs do not consider use frequency or creation date or last used date or file name or anything else when defragmenting a volume. The consideration is basically best fit to maximize space usage and to maximize contiguous free (available) space.

I must also mention that the site (http://www.pcguide.com/ref/hdd/) was very firm in asserting that FRAGMENTATION IS BAD and attributing track-to-track head movement as the major factor of negativity.

After all of this reading I remained unsatisfied that fragmentation is always bad, so I decided to conduct some simple benchmark runs.
I wrote a program to create 3 tables, two of them with memo fields, and with 5 identical indexes on each. The program then looped 500,000 times to create that many records in each of the tables. The memo fields were filled every 4th record on one table and every third on the other.
The program wrote records with 3 INSERT . . . commands in a row, then a FLUSH command.
Then every 100th record from one of them were extracted to a separate table, keeping only the key fields in that table. That table was then used to "drive" each test run.
The test tables were named FRAG01, FRAG02 and FRAG03. The HD was defragged and confirmed to be absent of fragmented free space before the run and fragmentation status was checked when needed.
The test were first run on a notebook (DELL Latitude C800, P3-800mhz, 384meg RAM, 20 gig HD, 53% free. The HD was HPFS (Win2000 Pro) with sectors at 512 bytes and clusters at 512 bytes.
I felt that the tests may not be sufficiently representative (especially the cluster size of 512 on the notebook PC) (after running most) so I reran all tests on a desktop with AMD 700mhz, 512 meg RAM, 40 gig HD 60%free. The HD was HPFS (XP-Home) with sector size of 512 bytes and cluster size of 4096 bytes.

Very Fragged = well fragmented
Defrag Util = run after defragged with OS util
FILETOSTR... = used FILETOSTR()/STRTOFILE() make tables/files contiguous
PACK/REINDEX = ran after doing VFP6 PACK/REINDEX on tables

Runs on DESKTOP:

TEST EXECUTED    VERY FRAGGED  DEFRAG UTIL  FILETOSTR/STRTOFILE PACK/REINDEX
                   SECONDS       SECONDS          SECONDS       SECONDS
Seeks (5,000 per)   90.603       174.713          160.437       177.094
SQL on FRAG01*      30.495        26.399            9.775         8.990
SQL on FRAG02       15.656         7.475            7.425         7.950
SQL on FRAG03*      36.873        15.942           12.845        14.871
SQL on all 3*       87.238        36.729           42.834        37.224
* memo fields included as fields to be output

Runs on NOTEBOOK:

TEST EXECUTED    VERY FRAGGED  DEFRAG UTIL  FILETOSTR/STRTOFILE PACK/REINDEX
                   SECONDS       SECONDS          SECONDS       SECONDS
Seeks (5,000 per)   63.481       301.754          266.854       333.360
SQL on FRAG01*     123.167        47.048           46.888        43.873
SQL on FRAG02       71.663        29.362           30.994        25.586
SQL on FRAG03*     164.596        58.464           56.681        49.971
SQL on all 3*      437.949       199.998          188.791       169.614
* memo fields included as fields to be output

I'm still mulling any conclusions. So far they are adding up to fragmentation being very good to help SEEKs but apparently VERY POOR for SQL processing.

I hope this is useful information.

cheers

Map

View

Click here to load this message in the networking platform