Most strange corruption ever

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Most strange corruption ever

Message

From

23/08/2002 07:03:10

Peter Stordiau
Heart Informatisering B.V.
Barneveld, Netherlands

22/08/2002 07:59:53

Peter Stordiau
Heart Informatisering B.V.
Barneveld, Netherlands

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Most strange corruption ever

Miscellaneous

Thread ID:

00692378

Message ID:

00692824

Views:

The problem in brief :

Data in the dbf gets corrupted, a Browse with Set Refresh to 1,1 subsequently (each other second) showing nulls and the orginial data as it should be.

The data gets corrupted from off any point in a last block at the moment the block overflows to a new one.
It is only the very last record in the corrupted block -overflowing to the new block- that won't show its original data ever.

The null-area before this last corrupted record can be recovered by ordering the server (Rlock() etc.) for a fresh copy of the block.
Opposed to this, in the physical file really are nulls.

All is caused by something happening at day 1 as the last write to the table, in combination with something else happening at the first writing of a new record at day 2.

Both FPdos2.5 and VFP5 (no SP) can do it.

This occurs within Novell 4 and 5 only, and does not appear at NT server.

The overflowing of the block to the new one is the keyword.

How can this ever be ?
--------------------------------------------------------

In one of the re's just created by me, I just "decided" that this problem can be approached from some different corner : The data in the indexes concerned just contain the null data as well. What may this add to the solution ?

Where nothing is the matter with the table at day one, and the transaction log showing all was written and re-read from the table properly, I assume that at that point in time the index contained the proper data just as well.
Furthermore, I assume this as proven, because never was reported that data could not be found at day one.
At day two however, the corruption always occurs by means of "my data has gone", and no further errors occur. A browse with active index always show the corrupted records on top of the file, implying too that the nulls are in the index data.

From the above must follow that at day 2 (or during the night) something is rather formally replacing the data in the block with nulls. Personally I'd say that this hardly allows for "something overnight" because it will need Fox involved to apply the nulls to the cdx and the dbf at the same time. IOW, I don't assume any client software and surely not Novell itself to be that intelligent to apply this logic. So a (rather new for the day) conclusion :

It must be the PC from day 2 applying the nulls into the dbf and cdx (at the first write of the new record for the day).

How can this ever be possible ?

By itself it easily can, but what cannot be is that several records can be involved. Again, our transaction log is exactly proving what records are logically being written, and this surely is about the last record in the corrupted block (overflowing to the new block where all is right), and not about any of the previous data which turn out to be nulls.

Furthermore, the null area never starts at the offset of a record, but randomly; not any calculation I could think of (I've done thousands) lead to a rule where the null-area starts.

So what is causing the nicely apply of the nulls in the indexes as well ?
Small intermediate conclusion : it should be Fox doing it (FPdos, VFP, they do it both).

There wouldn't be any logic in the start of writing records which it shouldn't. IOW, won't believe that. But what about the flush of some cache area ?

I have learned to assume that when data goes from the PC to the server, this is at the block level. It is assumed by me too, that this won't be a literal copy of the block as how it was received from the server, because that wouldn't make sense anyway. I mean, once the block is in the cache of the PC, and another user writes to the original at the server, the blocks wouldn't match anyhow, so why begin with a literal match ?
And in this area the problem must be ...

The block at the server was written to at day 1 and not touched any further that day. It is left with not wnough space to contain the next new record;

At day two, the first user drops into the building, and no one else is round yet. This block is fetched into the PC, and there is no need to have it of the same size as the original in there. Furthernore, I guess that the (cache) blocks in the PC are different anyhow.
= again, only one user is there =
A new record is appended, and during the process I can assume that the block is re-read from the server again, knowing that the app applies stuff like first appending the key, following by an RLock(). At this stage things could go wrong already.

The above tells that firstly the header was locked because of the Append Blank (can be an SQL - INSERT too), also implying a re-fetch of the block.
It also tells that at this stage the block is flushed back to the server, because newly appended records always do at once (not 100 % sure here).
Within a fraction of a second the RLock() will follow, and now what ?

I've learned that an Rlock() actually doesn't re-read the record, but (obviously) re-reads all the records within the current block, because data transfer is at the block level. Now what block is re-read ?

The record as current, is spread over two blocks. So, Fox needs (I think) special provisions for that. That is, the server needs to read two blocks, which may in the PC be turned into one logical area. But this looks tricky.

In between the lines : I have never focused on calculating the backward offset from the end of the record in the new block (that part being alright) towards the beginning of the null area in the corrupted block. Now I wonder, would it bring other conclusions if I approached it like this ? I guess there is this possibility, knowing that a block overflows at a random point in the records. I.e., that possibly does not allow for a proper conclusion, whereas beginning at the end of the record in the new block sure implies some fixed offset point. Hmmm.

Anyway, to end the ever too long stories of mine : when the written record in the PC is send back to the server, it will match its own block(s !) with the block sent, and things are being mixed up. Is it ? I think that conclusion is wrong, because it won't declare that the indexes obtain nulls.
So it's mixed up in the PC already !

And if I recall all well, the transaction log even shows this; the record written by the PC shows nulls in the transaction log for the key. I must say this is a more difficult part, because the what the transaction log shows is a re-read of the key as it did this itself. Anyway, it shows that at the moment the PC (i.e. Fox) thinks all is well (that is after the Append Blank), the log already shows nulls being in there. After that the PC is performing the remainder (the Replace).

So at the moment the PC sends its new record with nulls, it sends a whole block with nulls, and because all goes via the cache of the PC (the cache of Fox !) all is interpreted by Fox as changed data. Or other : Fox will have some detached process that checks the contents of some logical area (block) and sees that the records have been changed so it applies it to the indexes as well.

Anyone our there ... the above is subject to pinpointing the pitfall of Fox.
But let's not forget, it only happens overnight, so it can't be that simple.

Heart-Profit

Map

View

Click here to load this message in the networking platform