Fragmentation - why it is usually good for VFP native da

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Fragmentation - why it is usually good for VFP native da

Message

From

15/01/2003 19:09:18

Rob Steinbach
Creative Resources
Martinez, California, United States

15/01/2003 13:47:35

Jim Nelson
Toronto, Ontario, Canada

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Fragmentation - why it is usually good for VFP native da

Miscellaneous

Thread ID:

00742043

Message ID:

00742184

Views:

>WARNING - long-winded diatribe following. . .
>
>This tries to explain why it is that a typical VFP application benefits from file fragmentation as it occurs naturally on a NTFS system.
>
(snip)
>
>Now there is some "strangeness" as regards the reading of .FPT data relevant to any record. Testing has shown that "defragmenting" (using FILETOSTR()/STRTOFILE()) just the .FPT tables results in significantly faster SEEKs and Select-SQLs. However, the impact of doing this is not known as concerns subsequent writes to the tables in question. My inclination would be to leave the .FPTs alone. But if the files are not big already, then I might give consideration to "defragmenting" the .FPTs on a frequent regular basis.

Hi Jim:

WARNING: Long winded related issue with SQL-Select vs FPTs...

Some time ago, I had a need to split up a table that was used to store historical copies of all letters sent from one of my clients to their customers. The actual letters with their printer setup codes are stored in a memo field in this history table. The reason for needing to split this up was that the memo field was getting to be over 1 GB in size, and there was not even enough disk space free on their server's DATA1 volume to PACK the table. Also there was a need to group the data in smaller chunks (like by month/year printed, rather than 5 years worth of data in one table). Since an index tag existed for the "batch_date" field, it seemed to be a natural choice to use SQL-Select to just copy out the records month-by-month to a
separate table, even though the select would be only partly optimizable. This approach was found to be impossible; however, due to what I consider a significant bug in FoxPro's SQL-Select processing for memo fields.

The original tables were

LBB.DBF 74,297,842 bytes
LBB.FPT 1,083,701,504 bytes
LBB.CDX 17,534,976 bytes

The follow code stub was tried, after setting up memvars for lc_arcfile (the month-by-month) archive to split out, n_month (month to archive), and n_year (year to archive). Initial testing was, fortunately (as you will see below), done on a subset of the full table, to prove-out the code.

SELECT * ;
FROM lbb ;
INTO TABLE (lc_arcfile) ;
HAVING ;
MONTH(batch_date) = m.n_month .AND. ;
YEAR(batch_date) = m.n_year ;

The initial test table was about 7% of the full data:

LBB1.DBF 6,118,891 bytes
LBB1.FPT 77,714,432 bytes
LBB1.CDX 1,074,688 bytes

A single month's worth of data should have been about DBF-1 MB, FPT-10 MB, no CDX. The actual results were:

LBB2.DBF 849,232 bytes
LBB2.FPT 77,714,432 bytes

The FoxPro SQL-Select had copied the entire memo field to the test output file! Furthermore, this took about 22 minutes to run on a Pentium-133 (probably due to writing out all the memo data -- this was several year ago, which is why the PC was so slow...). I found that I could obtain the "real" data by simply PACKing the output file:

SELECT (lc_arcfile)
PACK
USE IN (lc_arcfile)

The results of this were:

LBB2.DBF 849,232 bytes
LBB2.FPT 10,984,576 bytes

These were actually the correct file sizes, and took about another 1 or 2 minutes to complete the pack. It seems as if a FoxPro SQL-Select selects only the correct records from the DBF, but puts ALL memo records into the output, with internal flags as to the un-wanted data (similar to when a memo field is edited, and a copy is made and appended to the FPT with the original memo marked for deletion). For practical reasons of both disk space and time, this was not an acceptable solution for the real splitting up of the full history files, so a different approach was needed. The final technique used was a semi-automated "brute-force" approach.

SELECT lbb
COPY STRUCTURE TO (lc_arcfile) WITH CDX
SELECT 0
USE (lc_arcfile) ALIAS arch_file

* just get pointers to the records desired into a small cursor table...
SELECT ;
RECNO() AS n_recno ;
FROM lbb ;
INTO CURSOR temp ;
HAVING ;
MONTH(batch_date) = m.n_month .AND. ;
YEAR(batch_date) = m.n_year ;

SELECT temp
GO TOP
* now step through the cursor & get the matching data directly
* from the source table into the output table...
SCAN WHILE .NOT. EOF()
ln_rec = temp.n_recno
SELECT lbb
GOTO (ln_rec)
SCATTER MEMVAR MEMO
INSERT INTO arch_file FROM MEMVAR
SELECT lbb
DELETE NEXT 1
SELECT temp
ENDSCAN

Not only did this result in correct files sizes from the beginning, but each month's test set of output only required about 6 to 8 minutes to process, and no extra "PACK" step was needed. When placed into production, the full table was split out, with CDX's included, in about 9 hours (using a spare volume on another server).

This anomaly was originally tested using FoxPro-2.6/DOS, but the same results were obtained in subsequent tests using FP-2.5/DOS, FP-2.6/Win, VFP-3.0 and VFP-5.0.

I had not previously seen any references to this behaviour in SQL-Select, so there are probably others who would get caught unawares. This situation was somewhat unusual, but should not have been the problem it was. The best lesson to be learned here is to run a test on some sample data before putting into place a critical new technique, particularly if it will be very demanding of the resources of the system, or very time consuming. Bugs and anomalies are not all documented!

This was brought to mind from your comment (above) about faster SQL-Selects when FPTs are defragmeted. It shows that other considerations may also be necessary.

Happy coding...

Rob Steinbach

Map

View

Click here to load this message in the networking platform