Most strange corruption ever

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Most strange corruption ever

Message

12/09/2002 05:36:19

Peter Stordiau
Heart Informatisering B.V.
Barneveld, Pays-Bas

11/09/2002 18:12:39

Pat Chrisco
Xebec Data
Corpus Christi, Texas, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: Most strange corruption ever

Divers

Thread ID:

00692378

Message ID:

00699670

Vues:

Hi Pat,
see below.

>Peter,
>
>I am glad my message initiated this thread, because we are still having the problems and it sounds like you have figured out what is causing corruption is - I think.

So this was thread #692198.

>
>However, after reading the msg a number of times I still do not see that there is a resolution. Am I missing it or is there no VFP fix - just a network issue? Sounded like the "=recount()" after the TABLEUPDATE and the other suggestions made in this thread have been rejected as fixes. I was thinking that my next move would be scraping the buffers and tableupdates code and changing back to variables and manual file locking, but I am not sure this would resolve the problem either if it is a network/block issue.
>
>SIDENOTE -> In your discussion on the nulls (||||); we also see them in the table when browsing. As they are populating the primary key we must delete the records with a DELETE FOR EMPTY(send_date), the dates for some reason are all empty, & PACK before we can do a re-index. It is strange that they can get into the table w/out giving a "Uniqueness" error, but if we try to re-index we get the error and can't proceed until they are removed.

This isn't so strange at the moment the Index is used to define whether duplicate keys are there, and the index is correct.

I think this is some important analysis here;
It tells me that the data (dbf) got corrupted in the server aera, and not in the client area. This because of the Index still containing the data. But, does it ? I mean, you told in the other thread that the index records pointed to non-existing data records. But, it should point tot null-records, right ? I think it is worth while examining what the index records point to exactly. Hence, do they contain rubbish or normal index data. I guess it is the latter, because otherwise the chance on duplicate entries (and the according message you don't recieve) would be too high.

Anyway, I hardly can imagine that your problem is the same as mine, because I defined that to be in the Novell area. And you are on W2000. But, you never know whether it's in the VFP (or client) area after all, and NT (W2K) dealing better with it.

Look carefully at the null-areas in your tables. Is it shifted (guess not) ?, is the length consistend ?, can YOU recognize block boundaries ? (find out uour block/cluster size first), can you proove that it has been correct in there, hence afterwards things started to corrupt ? Be careful not to look at caches here, and better peek with a hex editor.

I hope you get around it !

Regards,
Peter

>
>Thanks,
>Pat

>
>>Hmm, you might not be the only one ... I'll try to have it a go :
>>
>>Let's say all computer devices communicate in terms of blocks of data. And let's say that this is because of avoiding overhead in addressing (data has to go from something to something else). Hence, the larger the block, the fewer the overhead for addressing, and all further involved when the block is transported by the means concerned. On the other hand, the larger the block is, the larger now is the overhead of the blocklength itseld. IOW, when we, the user, logically communicate at the recordlevel, we receive and send blocks (it's done for us).
>>
>>Sidenote : all is somewhat more complex, for example thinking of the physical means of transport itself, using other "blocksizes" opposed to the blocks we usually look at (see later). Fot this matter, an Ethernetblock (we call that a packet) can be adjusted to dit the needs of the apps running around, and it can even differ in size for the various stations connected. And, when the LAN extends to a WAN, the "blocks" here will be even larger, in order to let routers etc. communicate (talk to eachother) only once in a while.
>>
>>Blocksizes are always about speed, with no conclusion in advance whether they should be bigger or smaller etc.
>>
>>Now to the hexedit, these are the blocks we usually look at, i.e. these are the blocks we can look at by normal means (the hexeditor). But, this is always subjective to the blocksize used, and you cannot see this by looking at a file's data. You'd have to know it.
>>BTW, for a Windows OS, I think that I can say that this would be the "Clustersize" (not sure about that).
>>
>>Though I don't see anyone ever talk about it, the blocksizes from any network logical volume (Novell, NT) can be adjusted to the needs. Thinking native VFP tables only (for now), this would imply that the logical volume for the dbf should have the smallest blocksize possible. Why ? because we, the users, communicate at the record level, and they are generally assumed to be smaller than the smallest possible blocksize. Thus, when we fetch a record of 100 bytes, we'd better deal with a block that's near to that length. For a Novell OS the minimum (I think) is 4KB, and for NT is (I think) 512 bytes.
>>
>>By the above I already imply that the volume for Index files (cdx) must be different from the data (dbf), and it is true. When the volume is different (a different physical disk is far better btw), we allow ourselves to adjust the blocksize for the cdx to be different from the dbf, right ? and that's necessary too. Hence, a blocksize of 4KB (or even 512 bytes) would be far too small for the Index, because chances are high that we read the Index by it's physical sequence, so we'd better fetch a large block once in a while, the PC receiving that and using that block for say 1,000 records to find in the dbf. The dbf won't be (ever) read in sequence at all, so the records we'd need will be in random blocks always. So that's why these have to be as small as possible.
>>Note that it is not easy at all to determine the blocksize for the cdx volume, and for example 64KB just appears to be too large (not efficient anymore, also knowing that we are not always reading the cdx in sequence, and lookup tables urge for just SEEKs). To my findings 32KB is just right for the cdx, but please note that it highly depends on the app used.
>>
>>Since you now know about the functional use of the blocksizes, you are even better able to understand what you are looking at, when a file is looked at by means of a hexeditor;
>>
>>A file always starts at an offset within a block of 0 (00h), IOW, it begins within the physical block of the OS dealing with it. Two sidesnotes :
>>1. Where we say "physical block", that must be looked at as "logical" again, because underneath that is the more physical thing : the disk sector. In my experiences, I learned never to look at that level, because disks just do their work properly. It's OSses doing wrong.
>>2. The Novell of nowadays (from off 5 I think) is able to subclass a block into other (any) areas. This means that the one normal block can be allocated to several "users" (files). BTW, I sure hope we are not using that feature ...
>>
>>So we have our dbf volume with a blocksize of 4KB, or 4,096 bytes. This means that at each 4,096 boundary a new "physical" block starts according to the network OS.
>>
>>When I say that a complete block shows (!) corrupted, it means that an exact area between two 4,096 boundaries shows corrupted.
>>
>>When I say that the first portion of the last record in a block is corrupted, I mean that from off the point where the last record in the block starts, up till the end of the 4,096 boundary, the record is corrupted.
>>
>>When I say that the portion of this record in the new block is alright, I mean that exactly at the next boundary of 4,096 the record is okay again.
>>
>>The remainder about the case again :
>>
>>When I say that in the past I never started looking backwards from offsets for a rule on the corruption, I mean that I never tried to look for the end of the okay-portion of the record in the new block, and looked backward for the length of the record. So now I did, and came to the easy conclusion that the fixed corruption starts always at the beginning of a record.
>>BTW, this looks too stupid not to have found earlier, but that's because in the area preceeding the corrupted record, also corruption is present, but not in a subsequent area. This is because the records preceeding the last records are written to as well, and as soon as that happens, the fields written look okay, with nulls around it. Thus, when the 2nd field of a 10 field record is written to, it shows this 2nd field okay, and fields 1, 3,4,5,6,7,8,9,10 all will be nulls after that. Hence the beginning of the last corrupted record can't be seen, because it's nulls only all over the place. Only when you start at the end lookin backwards, you can draw conclusions from looking at some 15 cases, a few of them being about the before last record not been written to after the corruption showed up in the PC. And then always one similarity will show : in 100 % of cases the corruption starts at the last record and is subsequent up to the end of the block.
>>
>>>
>>>I have noticed that the browse does not always disclose a problem, particularly at the end of a record which is 'short'.
>>
>>That's because it all works with offsets. And I can tell you, since I have been manipulating the header, there is really nothing that will make VFP think that something is wrong by either having the file too long (opposed to the reccount) or having the file too short. Note that there is an eof mark which really does NOTHING. BTW, I am talking VFP5/6 here, and as the other thread showed : VFP7 might have some (not working ?) intelligence for this matter.
>>But, as my case shows, there is something which appears to VFP as not being consistent, and which makes the browse behave as in my case. Again, I already can emulate this, but with no real conclusions yet.
>>
>>Please note your description of the record being short : when this is in the middle of the file, you must be talking about "all the remainder of the file shifted". Hence, one byte short will imply the delete mark (being the first byte of any record) to be incorportated into the previous record. It will be just an *. But when you talk about the very last record in the file, you will be receiving the eof mark form the file. No problem with this, because it's not interpreted by VFP at all. It depends on the field type, what is shown.
>>When more than one byte is missing at the end of the file, you still won't have a problem (according to what shows). But you (VFP) will be receiving nulls. But beware, this depends on the netork OS, and what it makes of a block (!) that is accessed behind it's written boundaries. Remember, when a file shows that it's 3,000 bytes, but the record (and header) length imply 3,010 bytes, VFP just will show you the last record including the 10 bytes not been written in the block. And not to forget : the block is allocated to the file anyway, so you will be receiving it in full (e.g. 4,096 bytes). So, at allocation the block was emptied (nulls) by the network OS, and when reccount includiong offset etc. tells that 10 bytes must be presented from this null area, it (VFP) will just do that. It's just legitimate.
>>
>>Remember what I told earlier in between the lines : corruption is a hard word for something that might be just normal. Corruption -in my case- is not much more than the network providing a block with nulls in it (which is wrong, i.e. it should have given another block or so)), and when that block is written to by VFP, this block becomes allocated to the file (and the original is freed to the / by the network OS). From this point (of view) it's really all normal. But wrong.
>>
>>>
>>>It effectively finds an empty field, or reports the content it does find.
>>
>>That's the danger here. The browse is a map from what it expects, but in fact it expects nothing. Example : the delete mark can be space or *. When it's replaced with 00h, VFP won't report an error and she decides for the record being deleted.
>>The same with all other nulls (00h) : as soon as the numeric field or boolean field is selected, the nulls (|||||) will show as 0.00 and .F. respectively.
>>
>>Cheers,

Click here to load this message in the networking platform