Most strange corruption ever

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Most strange corruption ever

Message

23/08/2002 04:27:17

Peter Stordiau
Heart Informatisering B.V.
Barneveld, Pays-Bas

22/08/2002 20:37:13

Cy Welch
Metsys Inc
Sacramento, Californie, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: Most strange corruption ever

Divers

Thread ID:

00692378

Message ID:

00692801

Vues:

>Peter,
>
>This sounds like the infamous Novell Client 32 corruption problem. This is caused by the default settings for the client cause data to not be written to the server. Here is some information that we send to our customers who use Novell.
>
>
>b. The following settings need to be set under the “Advanced Settings” tab of the client32 properties.
>i. Delay Writes: Set to “Off”
>ii. Use Extended file Handles: Set to “On”
>iii. Any other future settings that may be added that can cause the client to delay writes to the server.
>
>If these settings are not changed from the defaults, you are GUARANTEED to get data corruption.

Cy,

Thank you for this. Please note that your list, in fact, is quite a bit longer depending on the client version.
But anyway, your are exactly right on the suggested settings.
BTW, you no why these kind of settings should be shutoff as mostly as possibe ? because Fox from nature does all these things itself (FoxBase already).
For this matter, use a screensaver, or worse, use the beautiful auto-reconnect feature from a laptop (and the Win-os), disconnect the cable, go home ... and you (we all) can imagine what happens next day.

We have been using all the client-versions imagineable, and ended up with using the MS client under Novell. And just because it happens under all these versions (so including MS client), I've "decided" that it can't be the client software. And anyhow, the problem disappears when the Novell server is changed to an NT server.
One thing : I find it suspicious that no provisions exist in the NT registry for the number of filehandles, and for this matter it is even more suspicious that Novell urges for the "Use extended filehandles". I am almost sure this is asking for trouble, but was never able to prove it.

Note : We do use some 220 handles, and "around" the problem the Too many files open stuff seems to be around in cases (see the re to Jim as well). Though I was never able to prove it being around in 100 % of cases, I surely proved that is was around in cases anyway. With this I mean :

The transaction log shows a certain user to be the last one writing new records to the file at day one. At day two the file gets corrupted (so, at the end of day 1 no problem). So, go to the guy who performed the last transaction at day one with the quuestion "did you encounter a Too many files or similar last end afternoon ?" The answer often was Yes, but even more often No or Can't recall. The Yes IMO never can be a coincidence.

The above is the experience at ourselves, but from the customer I received similar notices (but hard to find, because you don't even know all the users). All is very hard to track, also looking at our own experience that the Too many files always seem to occur in the other task opposed to the one where the tables get corrupted. So, we have the one task running our ERP app (where the tables get corrupted), and the other with our own Admi app (running the same base of the software). The user reports the too many files in the Admi app, and the next day the ERP table is corrupted (and it can just be the other way around -> Admi table corrupted). What does this say ?

For me that a Too many files will be reported at the coincidental situation that Fox is able to trap the error, being at the opening of a dbf or cdx. But remember (said elsewhere) there can just as well a File does not exist occur, which Fox reports at the DO of a PRG (FXP etc.), and the OS internally isn't granted a handle, but reporting "so can't find it".

I am about 100 % sure that more in the kernels of all the error isn't reported at all, for instance when the cache needs to be flushed. Now indeed the client comes in, and it depends all to that (software) how all is dealt with. If you'd see the "dynamic" browse in front of your eyes, you get the 100 % adaption that "something wasn't properly finished yet". For this matter allow me the following small story :

Once you dive into all the cache mechanisms, and the sole fact that Fox is always able to know the current status(es) of all caches around (within the PC's), I could "theoretically prove" that the whole can't work at all. I won't explain this in full here, and you have to go to the blocklevel in order to ever understand what I am talking about. Has to do with blocknumbers, the blocks residing in the PC, but the block in the mean time being extened by the other PC flushing. Flushing, I learned, that's even out of control of Fox. For all my years I thought that I controlled that (Flush, Unlock, Use ...) but it's not true at all. And obvious by the way; when the PC's cache is full, it has to flush. But to where ??

By now knowing the rather internals of it all, I could easily setup some stresstests for this, being sure I could fool all. But I could not (well, I could, but in other areas). Now, assuming that I know indeed how all works, it must be my conclusion that I miss a link somewhere. Hence, where we now know that the PC flushes at a random interval, how ever in the world would it be possible that the multi user environment stays decent after all ?
My temporary answer : there is some intermittent area at the server dealing with it. A kind of "PC has flushed but not for real yet". Think of this too :

Back then Fox allowed for some Begin Transaction, depending on the TTS of Novell. I think it was even Novell iteself providing the library concerned (it was FPDos). This might tell that TTS from Novell is involved anyhow. Added to this, I "learned" that Novell was able to deal with all records not Unlocked yet with the respect to RollBack at server failure. Well, I never saw it happening (the contrary : random errornous) stuff occurs), but the statement is (was ?) there anyhow. Now my beautiful corruption again.

Note that I'd really like this to be read by some VFP geek :

This problem always can be apporached from two situations :
1. The originating of it (something the previous day in combination with something overnight);
2. The Browse showing the weird stuff.

For the geek #2 is important :

How the he.. would it be possible for the Browse to come up with something all other tools won't show, and in fact, which isn't even there ??

IOW, I've always been thinking that once we know how the browse is able to come up with this data, we would have been found the origine at the same time. Almost sure of this.
As I told earlier, the Rlock() enforces the server to come up with the most current version of the data. And it does. But not from the file, because it is not in there. So from where then ? From its TTS environment ? From its cache in error ?

The fact that writing to the null area (the thing still being alive) causing this area to get corrupted definitly tells something too : it resets the alive thing within the server, and it's just the other formal write to the block really containing nulls already. Is this corruption ? no not at this time, because the block just contains nulls, and in between there I write something. So now there are nulls with something in between it looking alright (the area of the one record). Look to how to salvage the records :

Set Refresh to 60,60 (or higher), and hope that the time won't elapse during the next process :
Start Browse and enforce the re-read by means of overwriting one character from the last okay-record visible. This performs the internal RLock() and the block is read from the unknown area with the proper data still there (but for the last really corrupted record). Now as hell press DownArrow in order to refresh all the rows in the grid and perform a Copy To (to another file). When the timer doesn't elapse suring this process, all is there again in the new file (but for the last record in the corrupted block).

Back to the blocknumers, there must be something like Browse is able to come up with another blocknumber than the normal editor. IMO, it MUST be the client providing this. But at this stage the client never can be wrong because all the (various) clients are able to come up with this strangeness. So it must be the server itself.

For the matter of the normal editor (viewer) always coming up with the (supposed) real (null) situation, I can tell that Browse always comes up with the null situation just as well. Only at the elapse of the Refresh it starts loading the original data.

So far for now.

Click here to load this message in the networking platform