A database challenge - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

A database challenge

Message

03/03/2008 11:06:06

Walter Meester
HoogkarspelPays-Bas

03/03/2008 03:27:25

Kevin Goff
Pennsylvanie, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: A database challenge

Divers

Thread ID:

01295541

Message ID:

01298162

Vues:

> This is because the filtering upon the RANK and ROW_NUMBER is a table scan operation).
>
>Walter, that's only true if one uses temp tables or table variables. If one uses a CTE to perform subsequent filtering based on any ranking functions (which is what I try to do as a rule), a table scan does not occur.

First off all, I have not tested this on SQL5, so I can only deduct from what I know of the optimizer. I hope SQL5 is smart enough to do the bare minimum, but if you sort on a field that has no index the values for ordering have to be retrieved from the table before you can do any ranking at all. So ranking (unless I'm missing something) can only be performed with a table scan or an index scan if no other selection criteria has been added. If you add some additional selection criteria on the row level, it becomes a total different ball game. First all the records that match the selection criteria are retrieved and the resultset is scanned to do the ranking.

Anyways, the chances are big you end up processing more records than you need to. If you use ranking the find the first or last etc, it might be better to do something like the following example:

Example: Get the last invoice send to each client.

SELECT * 
    FROM invoices I 
    WHERE NOT EXISTS(SELECT 1 FROM invoices WHERE I.Debno = Debno AND Invoicedate > I.Invoicedate)

If you've got the right indexes in place you can reach full index coverage and it would be very quick.

>Additionally, both you and John Ryan assumed a table scan would occur in situations where you have multiple COALESCE statements in the WHERE clause. (Or at least one or both of your raised the question).

What I indicated, is that it would do either a table scan or an index scan. Can you take a look at your execution plan and see whether it is doing an index scan? If so howmany (For each or every field). It likely is going through one indexscan, does the lookup and applies a filter on the remaining arguments.

>However, I took another look at the execution plan tonight, and did some reading - apparently SQL Server, even 2005, doesn't handle COALESCE (or even ISNULL) as efficiently as one using it would hope. (Apparently DB2 handles it better, or at least so I read.)

>In SQL Server 2005, the most efficient way to handle parameters (where only some are filled in) in stored procedures is to do the following:
>
>

>WHERE (City LIKE @City  OR @City is null) AND (Address LIKE @Address OR @Address is null)
>

I'd not do it like that. If the Columns do not contain null values, you can do the following:

SET @City = ISNULL(@city, '%')
SET @Address = ISNULL(@Address, '%')
SELECT * FROM Clients WHERE City LIKE @City AND Address LIKE @Address

Else

SET @City = ISNULL(@city, '%')
SET @Address = ISNULL(@Address, '%')
SELECT * FROM Clients WHERE ISNULL(City,'%') LIKE @City AND ISNULL(Address,' ') LIKE @Address

The advantage is that the optimizer is smart enough to see that if @Address containes a '%' it does not need to process that condition at all, it just deletes it from the execution plan. In your case it will do a table lookup based on city and filter out the address manually.

See comments below.

>So for the following 2 approaches:
>
>

>1)  A program that generates the string.... "select * from Customers where  address like '%Eldorado%' and state = 'NY'"
>
>OR
>
>2) The following code:
>
>SET @LastName =  null
>SET @FirstName = null
>SET @Address = '%Eldorado%'
>SET @City = null
>SET @State = 'NY'
>SET @Zip = null
>
>
>SELECT * FROM Customers
> WHERE         (LastName LIKE @LastName OR @LastName is null) AND
>               (FirstName like @FirstName OR @FirstName is null) AND
>               (Address like @Address OR @Address is null) AND
>               (City like @City OR @City is null) AND
>               (State like  @State OR @State is null) AND
>               (Zip like @Zip or @zip is null)
>
>

>I'm going to post some specifics on my blog, but here are my conclusions:
>
>1) You could have either solution running in production....and then swap it out for the other solution....and I have a hard time believing that anyone (not just users, but I.T. managers) would notice any measurable performance differences.

It would depend on a large number of factors. With the trick I gave you above there would almost be no difference at all, because the database would execute the exact same execution plan (because of eliminating conditions).

But this is just one piece of the puzzle. Thought the trick above is nice, we can only use it with charcter fields. For any other datatype is would be a different ballgame because we cannot use the like operator.

The more complex a query gets, the more effort you'll have to do, to make your stored procedure as well performing as the dynamic SQL.

>2) There's been this discussion for some time now, about whether stored procs are "flexible". I think I've made a strong case that straight stored procs can be as flexible as dynamic SQL.

I don't think you did. The flexibility is in the fact that you're not constrained by what stored procedures is implemented at the database level. With stored procedures you can only execute a database retrieval by the implementation of that procedure. The problem with those stored procedures is that you try to make it more generic by adding more parameters you'll have to deal with trying to make the additional overhead in the execution plan as minimal as possible. With straightforward one table lookups that probably is not going to be too much of a problem, but the more tables, joins and subqueries are involved, the optimizer will have a more difficult job in dealing with the overhead (The unnessary filtering) and it might make the wrong choices. Therefore more testing would be involved in optimizing that stored procedure under different parameter selections.

To make things even worse, is that because of the execution plan beeing cached it will reuse the execution plan for every call. But since the nature of the query could be totally different next time (Because youre searching on different fields), you might be better off recompiling'(WITH RECOMPILE option) it on every call (I had such a case where it made a difference between 2 seconds and 4 minutes).

>Additionally, it's not secret that I'm a big proponent of the new language features in 2005 (as well as huge new ones in 2008, like the TABLE type, MERGE capabilities, and GROUP BY extensions). Only a small handful of new functions really make things go faster - the language enhancements are intended more to provide easier ways to get things done. As an instructor, I'd much rather teach a student how to query a hierarchical database using a CTE, then show him/her how to do it with previous techniques.

Yes and no, sure it is important to show new features, but the student should be teached the basics of SQL and tackeling these kind of problems in the old way really is a must for everyone who really wants to get most out of it.

>3) With performance being close, stored procedures are more about building a consistent API into your database, one that is client agnostic. Maybe in shops that are small in terms of developer staff, and/or only have one primary development tool, this may not seem like a strong argument. But scaling out dynamic SQL in more complex environments is going to be problematic - which is likely why the use of stored procs generally goes up in larger companies.

You draw your conclusion on one (fairly simple) example we did here, which is not representative for what is build out there. I'd rather teach a student to understand how the optimizer works and let them draw the conclusion themselves rather that presenting such thing as static fact. It does not prove of a scientifical approach to draw conclusions on one or even multiple examples: Data <> information <> knowledge <> understanding. The student has to understand what the database is doing to optimize a query and makes his decision by using that knowledge and understanding.

Walter,

Répondre

Fil

Voir

Click here to load this message in the networking platform