Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Windows systems - is file fragmentation bad?
Message
From
31/12/2002 11:59:01
 
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
00736741
Message ID:
00736988
Views:
9
Larry,

First let me say that there is nothing personal in my following comments (in-line)...
>Jim,
>To take an excerpt from the URL Al posted: http://www.pcguide.com/ref/hdd/file/clustFragmentation-c.html

I am in mid-read of that article(!) and though I have not reached that part yet, I did go to it.

>
>This illustrates my point of why file fragmentation is bad and how it can occur.

This, though, is my big problem. I can come up with that illustration based only on what I SURMISE and dream up. I'm hoping to find something definitive.
In addition, this relates to (as best I can tell) non-database applications. My theory is that database applications are, indeed, different.

>
>As for your questions about databases, tables, docs and spreadhseets, here is my opinion.
>
>Applications do not store disk information. They store offsets. If you need something from the middle or the end of a file, the application tells the OS to get the start of file + offset and the desired number of bytes (another offset). In order to do this, the OS has to read the entire file from start to the specified offset. It can't skip from point A to point M or Z. It must read the entire file in until it reaches the specified offset. The only way to do this is to start at fisrt cluster of the specified file and go the next cluster in the linked list. The OS does this until all the information requested is retrieved. It then returns that to the application.

My idea is somewhat different than yours. I do not believe that 'applications store offsets'. Specifically in database applications, I believe that the RECORD NUMBER is the primary information used. The record number can be used to determine the offset.
Similarly, I do not believe that anything has to "read the entire file from start to the specified offset" and I do believe that a database application can skip from point A to point M or Z. I believe this is done by processing the 'cluster allocation bit map' for the file in question and calculating the applicable bit based on rec#, std. recsize, sector size, cluster size, etc.
What you describe is probably close for a sequential file like a Word document or such.

>
>This level of abstraction needs to occur in order for you to be able to copy files to a completely different system and be able to access it in the same manner. If disk addresses were stored in files then disk fragmentation software would break all applications. Copying a file from machine A to machine B would result in a complete corruption of the file because the disk addresses wouldn't be the same.

As I said above, the record# serves well for the purpose of transferring between file systems because it too is independent of architecture, file-system, etc.

>
>This is the same technique whether dealing with database or administrative (Word, Excel) applications.

I believe this is incorrect.
>
>As for how Office files can get fragmented, Hilmar answered that. While a write from the application may put files down in a contiguous pattern, the underlying disk may not have the available clusters to do that. So the OS takes care of getting the next available block for each cluster write that occurs. If the disk is not fragmented, then a large number (or all) of the file clusters can be laid in a contiguous "line". If not, then the application is fooled by the OS to think that's exactly what happened. It doesn't need to know the gory details.

I agree with this. But there seems to be more to it. First, we shouldn't assume that the first 'laying' of any file would necessarily be contiguous. Or maybe we can/should assume so in a NTFS (5.0) file system - I don't know.
Also, NTFS (5.0) is said to have logic to reduce fragmentation, so that likely has impact on subsequent re-laying of any file. But what impact????
I've just got to the NTFS section of the artivcle AlD quoted, and I'm hoping it has some answers to these kinds of issues.

Thanks Larry, and I hope you will continue with the discussion. I plan to complete reading AlD's link and then add some more based on what I read there.

cheers, and Happy New Year, Larry


>
>Regards.
>
>>>>I tried this in the CHATTER forum, but an absence of response prompts me to re-try here.
>>>>
>>>>Keeping in mind:
>>>>1) Modern Windows systems are multi-tasking systems.
>>>>2) Windows itself (and its components, like IE) make significant 'quiet' use of your HD space for all manner of files, large and small.
>>>>3) Other applications (MS Word for example) can use HD space 'quietly' too.
>>>>4) Modern HDs are fast, processors are faster yet, and RAM is plentiful.
>>>>... what hard facts are there to back up the axiom (it is essentially an axiom today) that fragmentation is bad?
>>>>
>>>>That fragmentation is bad is so prevalent a concept that I must be missing something obvious. What is it?
>>>>
>>>>Thanks for any/all input on this issue.
>>>
>>>Jim,
>>>While disk fragmentation is not as bad anymore because of the speed of hardrives, file fragmentation still is a problem no matter what.
>>>
>>>If a file is found in multiple places on the hard drive then it takes more than one drive revolution to retrieve the file. Even if the disk spins at 10K, having to make as many revolutions as there are fragments (possibly 500+) will chew up resources. This is bad.
>>
>>Yes, I agree that that is bad. But really only in the case where the WHOLE FILE is wanted/needed for processing. Things like Word and Excel come to mind as such cases.
>>
>>Now what about a database, with lots of tables, where specifically, it is a rare thing indeed to read a whole file? Include in your consideration that it is common for a database application to 'need' a record from a few files at a time (thus usually implicating some .CDXs and often .FPTs too). Any change in that situation?
>>
>>By the way, my guess is that a Word or Excel file hardly has the chance to become fragmented because the whole is done in a single write, so there is virtually no chance for something else to intrude while it is written.
Previous
Reply
Map
View

Click here to load this message in the networking platform