Auto inc skips - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Auto inc skips

Message

18/12/2007 17:58:04

Al Doman
M3 Enterprises Inc.
North Vancouver, Colombie Britannique, Canada

18/12/2007 16:54:29

Jim Nelson
Toronto, Ontario, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: Auto inc skips

Divers

Thread ID:

01274981

Message ID:

01276528

Vues:

>Al,
>
>You've done a bang-up job here. I have little to criticize in what you've said.
>
>However, you seem to treat "cache" as some single-characteristic entity and I feel it's a tad more complicated than that.
>I count at least 4 different kinds of "cache":
>1. Hardware-based and controlled cache, most commonly these days on HDs but also available on (more expensive) HD controllers.
>2. Windows' 'network redirector' (or whatever they call it) caching which, I believe, is directly related to "OPLOCKS".
>3. Local Windows controlled caching on any PC system.
>4. Application controlled caching, including that done by VFP or similar products.

At the time of my previous post Doru and I were narrowly focused on write-behind cache on the server (Doru thinks his may be suspect at his workplace). I agree, that in a full network there are more examples at work, as you've described above.

>
>I was reasonably familiar and quite impressed with Novell 3.1 caching and it operated much as you describe below. To me it was reliable when the system was powered through a good quality UPS and speed was greatly affected by having it operational.
>
>My exposure to variety #1 above (hardware-based) has been on 2 different HD on 2 different (Windows) OSes and in both cases I received the previously mentioned 'Delayed write failure...' within 2 days of starting usage, and on a .doc file being manipulated using Word. Needless to say I turned off useage of that cache as fast as I could!

Our experiences differ. The only time I've seen that message (on my own systems, and at lots of clients) has been when a backup operator has switched off/unplugged an external USB hard drive after the backup completes, without going through the "remove hardware safely" process.

>
>We are all exposed to variety #2 when we use Microsoft Windows-based servers and/or workstations in a network. In general it seems to work pretty well, from what is evident. But we can't see what MS may be hiding from us. 3-4 years ago I found/read all I could from MS (sites) about this and found nothing concrete or factual. It was all couched in words suggesting theory rather than fact.
>Nevertheless, there are plenty of reports of problems when VFP or VFP-like applications are run, to the point that there are several documents outlining how to turn OPLOCKS OFF. This makes great sense to me, simply on the basis that:
>a) Any network can have workstations from Win95 to Vista and I just can't see how Microsoft can successfully test/prove that something as critical as this caching works reliably/consistently across such a network, especially given that workstations can be at different patch levels.
>b) MS networking still appears to be aimed at single-use files (as opposed to records) like Word or Excel 'documents'. Even SQL Server is more along this model than a records-based application.

Turning oplocks off can cause a large performance hit: Message#462793

Workstation OS version/patch level are definitely a concern; from the low reliability of Win95/98/Me machines, up to Vista introducing SMB2 with bugs of its own (which have been reported here).

I'd disagree with b), Microsoft's SMB file system and NetBIOS have supported multiuser file operations successfully for a long time. SQL Server opens its own databases exclusive, so network locking operations don't affect it.

It's arguable that some of the more advanced features such as oplocks favour performance over reliability, and I fully agree that there's little detailed information available on how exactly these work (welcome to closed-source software). Years ago I asked for information, none was forthcoming: Message#680964

And MS has clearly had bugs in this area, in the past: Thread#491641 , http://support.microsoft.com/default.aspx?scid=kb;en-us;326798

>
>We all also are exposed to variety #3 and I can say that I have noticed differences between different (Windows) OSes. I'd say that I have high confidence in this type of caching, but I do wonder how it interacts with their 'network redirector', OPLOCKs, etc.
>
>Those of us who have applications out there using VFP have experienced variety #4 and it generally seems to work properly. I would attribute some reported bugs as probably caused by faults in VFP's cching, but I'd allow that to be "common" with a product like VFP. Fortunately, we seem to get workaounds for most such issues. And I was DELIGHTED that VFP9 brought us the FORCE clause for the FLUSH command, in addition to specifying specific work areas to be flushed.
>
>I believe that each of these kinds of caching deserve separate consideration. Each is its own animal and behaviours vary depending on. But it is how they may work together that is the big issue, and it probably is clear from my opinion about 'network redirector' that I cannot trust that it will operate as required, particularly on a network with mixed OSes on workstations. And my experience with HD "write-behind" caching is such that it gets turned OFF as soon as I am reminded.
>
>As an aside, HD drive caching is getting a little less "simple" than earlier. Many HDs now support sorting of seek commands to squeeze better performance out of them. This, by the way, was available on mainframe disks decades ago. Depending on its implementation in HDs these days there could be interference between this and the write-behind caching.
>
>I think I've said what I wanted to say. Still shorter than yours I think < s >

Not by much < g >

>
>>>>VFP9 help, "Autoincrementing Field Values in Tables" has a short description on what happens but IMO it's not detailed enough to discuss what's really going on with locking, file I/O and caching. I will say that when a write cache is in place its value is the only one client apps are allowed to see; none of them can read the disk "directly", to allow that would mean chaos.
>>>
>>>I agree that's the way it should be.
>>>
>>>>... So, if an autoincremented value successfully travels from the workstation to the server that's the only value that other workstations should see, regardless of whether it's been written to disk or not.
>>>
>>>You seem to exclude cache failures, and say that it all depends on values successfully traveling from WS to the server. What is successful? When is the file header locked/unlocked? Could you give a scenario for getting one autoinc duplicate using your premises?
>>
>>Just as a quick recap, a write cache is conceptually simple - it acts as a layer on top of the disk subsystem, mediating reads and writes to it. As long as a file is cached, as far as local server processes are concerned, or remote read/write requests from workstations are concerned, the file's contents in the cache are the file. The contents stored on-disk may be way "behind" the contents in the cache, and are only guaranteed to catch up when an OS-level flush is performed, most often at system shutdown.
>>
>>I don't know what you mean by "cache failure" but here are some possibilities that could cause problems:
>>
>>1. Cache bypass. As I mentioned above, if somehow a process bypasses a write cache and reads directly from disk, the data read there may be out of date. This should be impossible, but software perfection is (mostly) also impossible.
>>
>>2. File loaded in cache ("steady state"): There should only ever be 1 copy of a file in cache. If there somehow is more than one, and an "old" copy is accessed this would be a serious problem. I'd consider the possibility of this occurring with a file duly loaded in cache to be extremely low to non-existent. Likewise if an incorrect offset was being read from the cache; that would likely return garbage rather than an out-of-date value, and AFAIK you're not seeing garbage.
>>
>>3. Process of a file being loaded into or dropped from cache. A disk cache is dynamic; the list of files it contains changes constantly, so an individual file may be loaded and later unloaded many times. I speculate that what happens is something like this:
>>
>>Load a file into cache:
>>- ask the OS for exclusive use
>>- once obtained, load its contents into the cache from disk
>>- start intercepting file read/write requests
>>- the cache retains exclusive use as long as the file is cached
>>
>>Unload a file from cache:
>>- temporarily deny any write access to the file (more likely, return some sort of "busy" signal to requests, or buffer them somehow)
>>- flush the cache contents to disk
>>- stop intercepting file read/write requests
>>- release the exclusive use lock
>>
>>I'd guess that what actually happens is a lot more complicated that this, because you can have scenarios such as a file being shared by multiple users being loaded or unloaded from cache (e.g. VFP) on the fly, and having to manage multiple pending read/write requests during those operations. If I had to guess, I'd consider this the most likely point of software failure at the server.
>>
>>4. Hardware errors. Most hard drives have built-in RAM buffers; the disk subsystem controller may have buffer or cache RAM as well. Firmware bugs are possible. Firmware bugs in drives I'd consider very low probability as their buffers are simple. Firmware bugs in subsystem controllers I'd consider more likely, the more complex they are. Some controllers offer firmware updates to fix errata. It might be interesting to ask your network admins if there are any such updates available for your servers' controllers.
>>
>>5. System-level utilities intercepting or hooking into file system operations (e.g. antivirus "real-time" scanners, software firewalls). It's probably a good idea to make sure AV is not scanning any component of a VFP app, including temp files.
>>
>>6. Network request/message handling issues. I still consider this the most likely cause of an autoinc issue. One simple scenario: suppose a workstation updates the value, which is written to the local VFP file cache, but VFP fails to forward that update back to the server? Any other workstation would pick up an "old" value from the server cache. Or, if VFP forwards the write, but the workstation network redirector doesn't properly send it to the server? Or, if the redirector sends the request, but the network is busy and the request times out? Or, antivirus or an outgoing firewall prevent or corrupt the request? Or worse yet, malware actually present on a workstation? What if the VFP app, or the entire workstation itself crashes in the middle of an update? The list goes on and on. Many workstations, times many possible points of failure per workstation makes it seem, to me, that the problem most likely lies there.

Regards. Al

"Violence is the last refuge of the incompetent." -- Isaac Asimov
"Never let your sense of morals prevent you from doing what is right." -- Isaac Asimov

Neither a despot, nor a doormat, be

Every app wants to be a database app when it grows up

Répondre

Fil

Voir

Click here to load this message in the networking platform