Should I add DELETED() tag

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Should I add DELETED() tag

Message

From

17/05/2001 10:06:48

Walter Meester
HoogkarspelNetherlands

17/05/2001 05:06:59

Vlad Grynchyshyn
Soft Serve
Lviv, Ukraine

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Should I add DELETED() tag

Miscellaneous

Thread ID:

00507170

Message ID:

00508281

Views:

Hi vlad,

>I agree about network bandwidth. However, the original post mentioned that there are 60% of deleted records in the table. The perfrmance degree when there are less than 55% of deleted records, as far as I know, so deleted tag here can have a benefit. Anyway, as already mentioned in this thread, better to use another approach and application architecture and get rid of deleted records at all.

Of course, it can have benefit, but like I said it depends on a number of factors. If you have a 5 million record table containing 60% of deleted records, it still is not a good idea to use the deleted() tag: When SET DELETED is set to ON, and the table is openened or queried trough a 10 mb network (or less) it could take forever, because all indexnodes with the value .t. (40 % of 5 million = 2 million records) has to be downloaded before rushmore can take action.

>As about deleted() records, this was not ever a problem for me because I usually use an integer PK in my tables and generate PK values using auto-incremental approach.

Please read Message #491025. If you use a Pk Integer in an article table you still have to make sure that the meaningfull key (Articleno in an orderlist) is unique. Well how do you solve this. This can be only done efficiently by using a filtered candidate key or else you'll experience one of the two following problems.

1. Not using a candidate key: Because there is no unique constraint on the candidate key, you'll be able to create two articles with the same articleno.

2. If using a non-filtered candidate key: Because the key is not scoped to non-deleted records only, you can not re-enter a previous deleted articleno. Having the same problems with intelligent keys.

>In this case there are no conflicts with deleted records, so no deleted() filter is needed. This approach is common for all servers, include SQL Server and Oracle, so I guess it is good idea to use the same in the VFP. Why mess with deleted records when this is quite old technology in VFP, and there are new things already available? There are a lot of samples of correct organizing of the PK auto-incremental values.

Oracle, SQL server, etc. don't know the concept of deleted records. IOW if a table in oracle contains a surrogate primary key and a candidate key on articleno, you can insert an articleno that has been deleted previously. When transporting the situation to VFP, you'll have a problem unless you filter the candidate key. Again read Message #491025 for details.

>As about Seek() - it works using binary search algorithm over the array of the index values. 1 million is ~2^20. For binary search it takes ~21 steps to find a record. 1% is 10,000 records. 1 million-10,000 is still close to 2^20. So if you have 1% of deleted records, filtering of PK will not speed up SEEK()

Filtering a PK will generally not give a SEEK() performance as the percentage of deleted records is low (in general). OTOH, the index INDEX ON .T. FOR DELETED() only creates nodes for deleted records: 1% of 1 million = 10.000 nodes (<2^14). Further, a SEEK for a deleted record succeeds within the fist node because all nodes contain the same value.

Snce the indexnodes are somewhat clustered (because hardware/disk I/O cost more than processing time) and the internal indexsearch algorithm is not known (How does it search for a value within a single indexcluster ?) there is many more to it than just the B-tree algorithm

>You require to have more deleted records to speed up by 1 step of the binary search. Well, index array is smaller? yes, but it is on the disk and stored using clasters that are read on demand. Making index file smaller will have no benefit here.

The smaller the index, the less cluster have to be read from the HD to SEEK trough the B-Tree. A cluster can contain many indexnodes. An indexnode (in a VFP CDX index contains more than 1 reference (either to an indexnode or tablerecord). For more information see the Developers guid of FPW 2.6 page A-25.

Making index file smaller by filtering out deleted records could be beneficial for query only, because entire index is read from serever. However filtered indexes are not used in queried, so no benefit here too.

Using SQL, you're right, because SQL is unable to use filtered indexes (which should be changed in VFP 8.0 IMO). However, when using xBase commands like SEEK(), INDEXSEEK(), KEYMATCH(), SET RELATION, etc there are o lot of things you can do to tune your application.

>Did you ever tried to think about this from the algorithms point of view? If you can point me to the real tests that show the time decrease when using filtered index for PK, it would be great. Yet,

Walter,

>>Hi vlad,
>>
>>>You should not be afraid about it. The slow down here is not significant. Search threads here, there were a lot of discussions here on this topic. When you have a lot of deleted records, the index will have benefit.
>>
>>Watch out with these statements. The performance degrade depends (for a large part) on the network bandwidth. On large and very large tables this can (or better will) lead to performance problems.
>>
>>Hi Kevin,
>>
>>Wheter a DELETED() tag is helpfull in your case is difficult to say. It depends on a number of factors:
>>- The bandwidth of your network
>>- The percentage of deleted records
>>- The number of records in a table
>>- The specific query.
>>- The internal memory of your workstation
>>- ....
>>
>>It is best to test it with and without the index tag in the real world situation (over the production network, on the users machines). Testing it on a local workstation will not show you performance problems in this area because the bandwidth between the processor and the local harddisk is not likely to be a problem. OTOH the bandwidth between the workstation's processor and the network server probably is.
>>
>>Yet, another alternative is to find another way to tune the performance. For example, in SQL SELECT, performance degrade is commonly caused by having more than one or two join. From a performance perspective it might me better to split SQL statements in two or more, or alternatively use XBASE commands like SET FILTER, SET RELATION, SET SKIP, SEEK() etc.
>>
>>
>>Hi vlad,
>>
>>>You should avoid filters in other indexes. I know there are people that recommend to filter indexe by deleted(), however, this is completely wrong because such indexes are not used in the optimization. Just the deleted() as an index expression is fine.
>>
>>Wrong ?? Nope. Though they can't be used by rushmore, you could use filtered indexes for filtering PKs and CKs to remain unique for non-deleted() records only (Surrogates won't fix this problem. see Message #491025 for further explaining).
>>
>>Futher You could filter certain indexes for special pruposes. If you've got a table where the records can contain certain types of info. You can filter the index on one of these types which allows to to SEEK() them in the fastest way possible (far faster than rushmore).
>>
>>Example: A 1 million record table contains 1% deleted records, and you want to recycle records. How you're going to solve this?
>>
>>Some will add a DELETED() tag to be able to search for deleted() records. However with such tables performance degrades dramatic when using such table over a network (unless rushmore is set of with SET OPTIMIZE OFF).
>>
>>It's best to add an INDEX ON .T. TAG delRec FOR DELETED() and search them with SEEK(.T.,"MyTable","DelRec"). Best story of all: The index is small because it only contains nodes for deleted records, optimizing performance.
>>
>>Walter,
>>
>>
>>>>Rex
>>>>
>>>>If Deleted() tag will speed up my queries, then it seems to be the best option, rather than going through any other solution.
>>>>
>>>>Is Deleted() not recommended as an index, I seem to get the feeling that people are trying to talk me out of it!
>>>>
>>>>Thanks
>>>>Kev

Map

View

Click here to load this message in the networking platform