Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Articles
Recherche: 

Rushmore Optimization - not always optimal
Hilmar Zonneveld, July 20, 2001
Rushmore Optimization can help make queries much faster. However, "Full Rushmore Optimization" is not always a desirable goal. "Partial Optimization" is sometimes much faster. It is often believed that to speed things up, you need to have as many indices as possible. This article explains that so...
Summary
Rushmore Optimization can help make queries much faster. However, "Full Rushmore Optimization" is not always a desirable goal. "Partial Optimization" is sometimes much faster. It is often believed that to speed things up, you need to have as many indices as possible. This article explains that some indices may actually make queries slower. Also, some additional tips on query optimization.
Description

Background

The need to write the following arose when discussing the same topics over and over again in the Universal Thread discussion group (www.UniversalThread.com). Instead of giving long explanations over and over again, I prefer to refer people to this document. Any suggestions for corrections - technical, or in terms of language used - can be e-mailed to me. Or, if you are a member, send me a note on the Universal Thread. I am assuming that the reader uses Visual FoxPro 6 or later.

What is Rushmore Optimization?

Rushmore Optimization is a technique used by Visual FoxPro (and some other Microsoft products) to speed up data access, for queries and views. The basic principle used in Rushmore Optimization is this: in order to find all records that match an expression, if an index exists for an expression, it is used - whether this speeds things up or not.

Example used

Throughout this text, I will use the following example: an invoice table has fields for invoice number, date, and document type. The latter has a value of "I" for invoices - it is required because the same table is also used for other information (not invoices, but with a similar structure; for instance, an order). The programmer wants to retrieve all invoices for a certain date. I will assume the result set contains 100 records, out of a total of 100,000 records. I will also assume that half the records (50,000) are for invoices (DocType = "I"), and 10% of the records are deleted (i.e., 90,000 records are not deleted). The basic query is:
set deleted on
lcDate = {^2001-07-21}
select * from Invoice;
  where DocDate = lcDate;
    and DocType = "I";
  into cursor TmpResult
I will not discuss the convenience, or not, of having this particular table structure - this is not the point of this document, and the problems discussed below can appear with other table structures as well.

Implicit condition for "deleted()"

Because of the "SET DELETED ON", to evaluate the above query, Visual FoxPro implicitly adds the condition "and not deleted()". The condition that must be fulfilled for any record, to be included in the result set, then, is:
(DocDate = lcDate) AND (DocType = "I") AND (not deleted())
It is for this reason that you can sometimes see the recommendation to create an index, for each table, with the expression "deleted()". As we'll see, this is counterproductive.

Full Optimization

To achieve full Rushmore Optimization, according to the manual, we would need three indices, one for each of the following expressions:
DocDate
DocType
Deleted()
The names of the indices are irrelevant; if an index on any of the above expressions, it will be used (I repeat: whether it makes sense to do so, or not!). Surprisingly, a "Full Rushmore Optimization", in this case, is much slower than a "Partial Optimization" - especially over a network. Let's see why. Let's assume that all three indices exist. To evaluate the above query, Visual FoxPro does the following: 1a) Fetch, from the index, all record numbers for the first condition (DocDate = lcDate). This is done quickly: because of the binary tree structure of the index, the 100 records required are found quickly. 1b) Fetch, from the index, all record numbers for the second condition (DocType = "I"). This takes much longer, because Visual FoxPro has to fetch record numbers for 50,000 records. 1c) Fetch, from the index, all record numbers for the third condition (not deleted()). This takes even longer: Visual FoxPro has to fetch 90,000 record numbers. In steps (1), Visual FoxPro only fetches record numbers from the index - the records themselves are fetched later. 2) Build a bitmap (i.e., one bit for each record: 1 if included in result, 0 if not) for each of the three conditions. We can assume that Visual FoxPro does this simultaneously with step 1. 3) Apply a BITAND to the result sets, to find out which records have to be included in the result set. 4) Fetch the records themselves. 5) Since the query was Fully Rushmore Optimized, there is no need to apply any additional filter to the records retrieved. It is obvious that, despite the "Full Rushmore Optimization", the query will take fairly long, because of the many unneeded record numbers fetched in steps (1b) and (1c).

Partial Optimization

For this particular query, I recommend to delete indices on expressions (DocType) and (Deleted()). In this case, only one index is used, and, according to Visual FoxPro documentation, a "Partial Optimization" is done. What Visual FoxPro does, in this case, is something like this: 1a) Fetch, from the index, all record numbers for the first condition (DocDate = lcDate). Again, this is done quickly. 1b) and 1c) These steps are omitted. Since no indices exist on the corresponding expressions, no record numbers are fetched. Therefore, much time is saved here. 2) and 3) Since there is a single "optimized" condition, there is no need to BITAND. 4) Fetch the records themselves. Here, deleted records, and non-invoice records will be includes; let's assume Visual FoxPro fetches 250 records. 5) Since the query was only "Partially Rushmore Optimized", a filter has to be applied to the last two conditions, so as to get only invoice records that are not deleted. While steps 4 and 5 take somewhat longer than with "Full Optimization", this is more than offset with the time saved in steps 1b) and 1c).

Recommendations

  • First of all, when testing the above statements, I have found that sometimes Full Rushmore Optimization is indeed faster on a local disk, but slower on a network. Results may also depend on the number of records in the table (reccount()). Do your testing for the worst-case scenario: 1) If you plan to deploy your application on a network, do at least some testing on a network. 2) If you expect a table to have one million records eventually, create dummy records for your testing. 3) Also, when testing, be aware of buffering: when accessing a table a second time, records (or index keys) may already be in a VFP or Windows buffer. You may want to run a test twice, to offset buffering effects.
  • So, what should a programmer do to speed up queries? It should be quite clear, by now, that to create an index on every conceivable condition is sometimes counterproductive - and not only because of the overhead required to maintain the index.
  • The basic rule is to create indices for any expression that returns few records for any given key value (and avoid others, just as you would avoid the bubonic plague). In the example used throughout this text, the index on DocDate fulfills this condition; indices on DocType and Deleted() definitely don't. Some indices, of course, are required for Referential Integrity, but these usually don't pose any problem.
  • Unfortunately, Rushmore Optimization is also used for expressions involving ">", "#", etc. Let's say there is an additional index on InvoiceTotal. A condition of "InvoiceTotal = 0" might return only a few records, as would a condition of "InvoiceTotal between 100 and 110". But how about "InvoiceTotal # 0"? This part of a condition might return almost all records from the table, and here, Rushmore Optimization might again interfere. Since the index is useful for other cases, you wouldn't want to delete it, so try to circumvent Rushmore Optimization just for this special case. You can use some non-optimizable expression, like "Empty(InvoiceTotal)", or "InvoiceTotal + 0 = 0". The first example will only be "optimized" if you happen to have an index on "Empty(InvoiceTotal)", the second, if you happen to have an index on "InvoiceTotal + 0".
  • A compound index can lead to very fast data access. If you create an index on "DocType + dtos(DocDate)", you can use a query with an expression [DocType + dtos(DocDate) = "I" + dtos(lcDate)]. To create a compound index, basically you have to concatenate several fields. To do this, convert everything to string. Use str() for numeric fields, bintoc() for integer fields (this produces smaller indices than str()), dtos() for dates, iif() for logical (example: iif(PreferentialClient, "1", "0")).

Questions

Some of the following questions might arise:
  • "But I thought (or Visual FoxPro says) my query is fully optimized!" If you read through the previous discussion, you will realize that "Full Rushmore Optimization" is not always ideal. As we have seen, sometimes, precisely because the query is "Rushmore Optimized", it is slow.
  • "I want to filter records on deleted()." Do it. The "expression" in the above discussion refers to the index expression, not to the filter expression. An index with a filter won't be used for Rushmore Optimization - this may be good, or bad, depending on the situation. Sometimes you may need two indices: one with a filter on deleted(), to check for uniqueness (a candidate index), one, without a filter, for Rushmore Optimization.

Other tips for fast data access

Apart from the tips on Rushmore Optimization given above, the following can also help in some special cases.
  • If you need data for reports, a single query on several tables is often too slow. Divide it up into several queries: start with one table, or perhaps two, and add one table at a time. In one case, I needed data from 9 or 10 tables; the query took 50 seconds (over the network). After separating the query into several queries, query time was reduced to 2 seconds. I assume that in the first case, Visual FoxPro applied "Full Rushmore Optimization" to one of the larger tables (around 60,000 records), but I am not so sure about the exact details. Example:
* Original query might be like this:
lcClientCode = "HZ123"
select;
  InvoiceMain.InvoiceDate, InvoiceMain.InvoiceNum, InvoiceMain.InvoiceTotal,;
    Client.ClientName, Product.ProductCode;
  from InvoiceMain;
    join InvoiceDet;
      join Product on InvoiceDet.ProductId = Product.ProductId;
    on InvoiceMain.InvoiceId = InvoiceDet.InvoiceId;
    join Client on InvoiceMain.ClientId = Client.ClientId;
  for Client.ClientCode = lcClientCode;
  into cursor Temp

* I prefer the following, which is often faster:
lcClientCode = "HZ123"
* Table Client
select ClientId from Client;
  where ClientCode = lcClientCode;
  into cursor Temp;
  nofilter
* Table InvoiceMain
select Temp.*, InvoiceMain.InvoiceId, InvoiceMain.InvoiceDate, InvoiceMain.InvoiceNum,;
  InvoiceMain.InvoiceTotal;
  from Temp join InvoiceMain on Temp.ClientId = InvoiceMain.ClientId;
  into cursor Temp
* Table InvoiceDet
select Temp.*, InvoiceDet.ProductId;
  from Temp join InvoiceDet on Temp.InvoiceId = InvoiceDet.InvoiceId;
  into cursor Temp
* Table Product
select Temp.*, Product.ProductCode;
  from Temp join Product on Temp.ProductId = Product.ProductId;
  into cursor Temp
Yes, most of the time it works to SELECT from cursor Temp back into cursor Temp. However, mainly for debugging purposes, you may want to maintain several cursors (they are closed automatically, anyway). NOFILTER in the first SELECT is extremely important. Otherwise, subsequent queries will sometimes fail (produce error messages). In the examples, I use "internal codes" as primary keys, i.e., integer values, assigned automatically, and not visible to the end-user. All my real-world applications work this way.
  • If you have a parent-child relation between tables (examples: client-invoices; invoices-invoice details), and you need to edit data for the child records, a filter on a grid is extremely slow. It seems that Rushmore Optimization is not used here. Use a view instead. A view is like a cursor, but the results can be written back to the underlying table(s). Views created in the View Designer can fulfill most of your needs, but be aware that the view designer has some limitations. Some complex views have to be created programmatically. This is relatively complex.
  • In general, for views, change the default update criteria to "key fields only". Otherwise, a more complex expression will be generated (internally) to update the underlying record; this expression might be "Full Rushmore Optimizable", and fetch lots of information from the index - in other words, lots of un-needed overhead, when all VFP needs is to find the single record that was changed.
Copyright (C) 2001-2002, Hilmar Zonneveld (hilmarz@yahoo.com). This text can be copied freely.
Hilmar Zonneveld, Independent Consultant
Hilmar Zonneveld works in programming since 1986, using dBASE, FoxPro and Visual FoxPro. He is available as an independent consultant. He currently works as a programmer at Bata Shoe Organization; also as an instructor at Cisco Networking Academy. You can contact him through the Universal Thread, or, via e-mail, at hilmarz@yahoo.com. Personal Web page (mainly in Spanish): www.geocities.com/hilmarz.
More articles from this author
Hilmar Zonneveld, May 1, 2003
An audit-trail is a record of who did what changes, and when. In Visual FoxPro, this can easily be accomplished through triggers. I hinted at the possibility of doing an audit-trail, in my article on triggers - now, as a reaction to questions in the Universal Thread, I want to present a sample...
Hilmar Zonneveld, December 6, 2001
(The latest update contains minor edits only.) Five easy and fun ways to get yourself into trouble with inheritance. A frequent source of problems in OOP is called "breaking inheritance". This document briefly describes what inheritance is, how it applies to properties and methods, and how it ...
Hilmar Zonneveld, July 1, 2002
Introduction Buffering is a feature in Visual FoxPro that allows us to give the user "undo" and "save" capabilities. In the old FoxPro 2.x days, programmers either didn't provide this capability, or edited memory variables, and copied information between these variables and the table fiel...
Hilmar Zonneveld, October 6, 2005
Due to a recent Windows security fix, users can no longer access a CHM file on a server. The table of contents appears, but the individual pages are replaced by error messages. Access to CHM files in specific folders can be explicitly allowed through special registry settings.
Hilmar Zonneveld, July 20, 2001
(The last update contains minor edits only.) The idea is to have several controls on a form controlled with an array. Thus, you can quickly go through all the controls on the form, managing the array. The sample code included will help you get started quickly. You can easily adapt it to manage...
Hilmar Zonneveld, September 1, 2002
With Automation, you can control all aspects of Excel, Word, or other programs that provide this feature, from Visual FoxPro. In this article, I will concentrate on Excel. Its purpose is to provide a starting point, especially for people new to automation. Introduction With automation, you bas...
Hilmar Zonneveld, March 1, 2003
Introduction One common task in programming is to keep track of what problems are pending. For this purpose, I use a "hierarchical to-do list": a list of items, each of which can have sub-items. All you need is Microsoft Word. Alternatives are available as freeware or shareware, but in t...
Hilmar Zonneveld, October 7, 2005
This is a step-by-step tutorial to show inheritance, specifically in Visual FoxPro forms, as a guidance for people who are not familiar with inheritance in general, or who don’t know how to implement it in Visual FoxPro. The basic idea of inheritance is that all your forms, or several of your for...
Hilmar Zonneveld, May 30, 2004
The code shows how to quickly obtain the greatest common factor, and the least common multiple. Both functions are used when manipulating fractions, among others. Several methods are possible; the method usually taught in school involves prime numbers, but this code will execute much faster (and it ...
Hilmar Zonneveld, November 1, 2006
A standard requirement in a production system, or in systems for cost calculation, is to add up all the raw materials for a number of finished articles, to get the total cost, or simply to purchase the materials. In this article, Hilmar outlines how to do this with multiple levels of intermediate ar...
Hilmar Zonneveld, August 1, 2002
Overview The purpose of this article is to give an overview of normalization. Basically, normalization refers to having an efficient table structure. I will not discuss the famous "first to fifth normal forms" - if you want that information, enough texts exist about it in other places (search sit...
Hilmar Zonneveld, November 8, 2001
The following function will open any document, with its default association (the same application that will be called when you double-click on the file, in Windows Explorer). Use it to open a text-file, a Word or Excel document, an image, etc., with an external application.
Hilmar Zonneveld, May 1, 2002
Introduction This document explains the meaning of primary key, foreign key and candidate index in Visual FoxPro. A discussion of natural and surrogate keys (keys visible, or not visible, to the end-user) is included, including the advantages of each approach, as well as different methods for o...
Hilmar Zonneveld, January 1, 2003
Continuing my series of introductory articles, this article presents an introduction of a simple yet powerful programming concept: recursion. Introduction "To understand recursion, you must first understand recursion." "To make yogurt, you need milk and yogurt." If you are not accustomed...
Hilmar Zonneveld, December 1, 2002
Introduction This article presents an introduction to coding shortcuts in Visual FoxPro - when to use them, and when not to. Notes on coding in general This article is about coding shortcuts; however, I should first emphasize that making the code as small as possible is usually not the number...
Hilmar Zonneveld, June 7, 2002
If you need to check elapsed time with seconds() or a datetime value, this function allows you to display the elapsed time in a human-readable format, that is, hours:minutes:seconds, instead of the total number of seconds. Just pass a number of seconds as a parameter.
Hilmar Zonneveld, April 1, 2002
SQL is a standard language used to manipulate databases. Several of the SQL commands are integrated into the Visual FoxPro language. Select This is a very flexible command, used to select data from a table, or from several tables. This command has options to get totals from several record...
Hilmar Zonneveld, August 1, 2003
In this article, I will show several ways to manipulate text-files. Knowledge of these methods is often important to import and export specific formats. Some of the techniques can also be used to work with files of any content; however, this article will concentrate on text-files. When ...
Hilmar Zonneveld, June 1, 2002
The purpose of this article is to show how to use some aspects provided by the Visual FoxPro database engine, to control our data. Indices Perhaps most readers already know indices; anyway, I find it convenient to include a brief summary of the topic, since this is a requisite to understan...
Hilmar Zonneveld, November 1, 2002
A help file can be used either for interactive help, or as an online manual. In this article, I will give an overview over creating help files in the new help format (CHM), for Visual FoxPro 6 and later. This article is introductory and assumes no prior knowledge of the Help Compiler, or of HTML cod...
Hilmar Zonneveld, February 1, 2003
Introduction Any real-world application will sooner or later misbehave. It is important to be able to find those problems. Visual FoxPro's built-in debugger can help a lot to find out why your program doesn't work as you thought it would. Most of the material in this article applies to Visual...
Hilmar Zonneveld, May 1, 2006
This article is an introduction to VisioModeler. This is a free CASE tool, that can help you design your database, in the process sharing the information with the client in a visual, easy-to-understand, format.