A database challenge - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

A database challenge

Message

From

25/02/2008 15:11:08

Walter Meester
HoogkarspelNetherlands

25/02/2008 13:37:25

Kevin Goff
Pennsylvania, United States

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: A database challenge

Miscellaneous

Thread ID:

01295541

Message ID:

01296201

Views:

>Walter,
>
>- This is NOT a VFP vs .NET discussion, so please don't place any comments dragging in this argument. The discussion is language transparents.
>
>I know you meant that for the audience - however, it should be noted that you used VFP for part of the solution. :)

The solution I gave could be written in any language, it is just a bit of string munging. The meat is in the SQL statement.

>Looking at your solution I cannot draw the conclusion that the CTE's, RANK() and ROW_NUMBER() is making the task you've outlined at the start of the thread significantly easier than the other approaches. It seems to me that MySQL is more than capable to implement this, even much easier (my observation), than SQL server 2005. Personally I found the solution I gave earlier for SQL2000 (using multiple TOPs) at least as acceptable as the one using ROW_NUMBER().
>
>To show the result number, ROW_NUMBER() works fine - you already acknowledged that you had to utilize a 2nd capability (using LIMIT and then incrementing a variable to return the row number). Now, I could be difficult here and say that I'm using one function and you're using two, but in all honesty, I'd be splitting hairs. IMO, both are equally fine for this.

I was arguing that it would not make this particular difference in this case as you don't need to calculate the row_number at all. it is neither neccesary in the front or back-end. BTW, before reading the MySQL solution of incrementing a variable I tried that in SQL2000, but it not running because SQL server complains that variable assignments and retrieval actions cannot be combined.

>FWIW, where ROW_NUMBER really shines is when you need to assign sequential numbers by groups.

I know of the partition possibility and I am sure it could be helpfull to address certain problems more efficiently, but to be honest I've not come accross a particular problem where I needed this

>Actually, CTE's have a multiplier effect, as you can nest CTEs. But the true power of CTEs is in recursive queries.

Hmmm, I can see the recursive argument, but I have a hard time to really see the real advantage of nesting CTEs over the other means.

>I've looked at the RANK() and ROW_NUMBER() functions before to solve certain problems, but it did not cut it, as it is still a poor mechanism to solve certain RECORD ORIENTED problems

>Can you be more specific? As it stands, I don't agree with this statement.

Well, the excercise we did before:
Given an invoice table for a specific client. The client has outstanding invoice of more than lets say 500 dollar. Find the invoice that exceeds the cumulative amount of 500 dollar.

Again we have gone through this before (you came up with a CTE solution, I came up with using ANSI SQL-92 syntax). But really ranking and using row_number() did not resolve this problem easier than any other approach.

>As I mentioned earlier, ROW_NUMBER is great for assigning values based on groups. Suppose you want to assign a ranking number to sales within product and within customer - you can do that in the PARTITION clause of the ROW_NUMBER statement. I think you might want to take a second look at the ranking functions (ROW_NUMBER is but one of them).

I've seen them and understand how they work, but to be honest I've got a bit of a hard time to see and understand which specific and significant problems they solve. Most problems (like this excercise) I can solve without.

>On the whole, your SP solution is far more complex than the generation of one SQL command that has everything in it.

>I don't see how you can arrive at that conclusion. The approaches are very different. I don't see either one as "more complex". My two cents is that you're penning it as more complex because it's a paradigm shift from the way you normally approach things. :)

If the number of lines in the statement and/or the number of keywords are a measure in this, yours will be more complex. But you're right, it is a subject measure in some cases.

>Those who regularly write SPs might argue that yours is more complex. I really wouldn't agree with that either. They are just two very different ways to solve things.

>- I can test the stored procedure independently of the application.
True, but unittesting the routine I wrote should not be any harder.

>- If there's a bug or something that needs to be tweaked, I don't have to redeploy an application component. (True, if there's a new search requirement, a new component needs to be rolled out)

True, but I don't know whether that is an advantage. If I'm rolling out a new application to all clients, I don't need to update all the SPs on all of those databases.

>- The stored proc is essentially the API for the task at hand.
The routine in the application layer forfills that taks in my case.

>- While you provided a solution that fits all the criteria, you have introduced two tools: VFP and the database. I'm only using the database.

True, but you still have to program the layer between the web and the database in something. Whether that is .NET or VFP.

>I'd also encourage people not to go into joins if not neccesary, esspecially in table with a lot of records. In your case when you've got only a few possible selections of statuses (lets say below a hundred) it might be faster to use an
>Status IN(1, 6, 7, 9, etc)
>solution. It would avoid the optimized to do uneccesary joins and enabled existing compound indexes to retrieve the information faster. I've been able to optimize queries running from minutes to subsecond responsetimes.

>I tested this out before, when I was using the XML approach. There is really no difference in performance. The table variable is in memory 99.99% of the time. There's an additional argument to be made that variable selections are more commonly passed these days as XML than as a CSV, but that's an argument for another day. :)

It depends on your indexes. If the where clause is optimizable by a certain index, you can add the status column to it and create a compound index. In your case: if I'm querying upon lets say city and status and have a compound index on those columns you avoid a join altogether and all the information is done in one indexscan, whereas in your case a tablescan for the status table has to be performed and has to be seeked, merged or hashed with the result of the customer selection. using the compound index this way avoids a tremendous amount of I/O and avoids selecting a less optimal execution plan. If you wish I can show you this if you send me the two tables.

>I know there might be other arguments (like security) reasons for doing SP

>Yes, and that is an important factor. Like it or not, some installations ONLY use stored procedures as the means for data access.

In that case, you don't have a choice.

> but in this case it is not the most efficient and simple implementation
>
>That's your opinion, but IMO you haven't offered anything to substantiate that. So I don't concur with it (nor am I saying the reverse). One could make the reverse argument that the SP itself is more self-documenting.

Well I have some doubts on the where clause being totally optimizable. Though it is a clever trick to pedge the variables into COALESCE to create the flexibility I wonder whether the optimizer is able to see what you're doing. Maybe you can send me a picture of the execution plan.

If for example I only fill in City, the other selection columns are still in the where clause and I'm not sure without trying whether the optimizer is disregarding those other expressions totally and can use only the index on city to optimize. In the worst case it is going to do a table scan against 1 million records which of course we want to avoid.

Further I don't really like the where clause in the body SQL statement. I find it unneccesary complex. I'd rather calculate the max and min rownumber into a @start and @end variable to retrieve and do something like:

WHERE (@lJumpToChar = 0 AND RowNum BETWEEN @Start AND @End) OR (@lJumpToChar = 1 AND 
    (@SortCol= 'LASTNAME' AND Lastname LIKE @AlphaChar) OR
    (@SortCol= 'ADDRESS'  AND Address  LIKE @AlphaChar) OR
    (@SortCol= 'CITY'     AND City     LIKE @AlphaChar) OR
    (@SortCol= 'STATE'    AND State    LIKE @AlphaChar) OR
    (@SortCol= 'ZIP'      AND Zip      LIKE @AlphaChar))

The statement above also increase the possibility for optimization, though multiple OR's still seems to be a problem in most databases.

When you go back to the solution I provided with, you only have it adjust it to provide the table name in a parameter and you've got a solution that is reusable for any table, just because there is no mention of any columns in the routine. So you only have to write and debug this routine ones and can use it for paging *ANY* table you like. No extra SPs for each and other table. I use this kind of templating a lot.

As for documentation, true, you shift the documentation of the provided functionality from the application to the database. But I don't really see that as an advantage. Now I've got to look in two different places.

Walter,

Map

View

Click here to load this message in the networking platform