Well, folks, this will be my last post on the UT. Yes, I'm "retiring for good" from the UT. If you ever want to get hold of me, my email address is kgoff@kevinsgoff.net
I'm scheduled to speak at the Philly VFP User Group in May - topic is T-SQL for Application Developers. Feel free to browse my site (
www.KevinSGoff.net) for any new content, webcast links, etc.
Final topic....as this came up as a question.....scalar UDFs vs Table Valued UDFs in SQL Server. As a general statement, single line TVFs will usually perform better than scalar functions. Here's a basic example....
In AdventureWorks, there's an employee pay history table, that shows pay rates and effective dates (for when the payment went into effect)
Suppose I want to know, at any one time, what a person's pay rate was on a specific day.
I might have a function that does this, using the subquery pattern to grab the most recent pay rate row for the employee, "on or before" the as of date.
create function dbo.UDFScalar_GetRateForAsOfDate
( @AsOfDate Date, @EmployeeID int)
returns Money
as
begin
declare @Rate Money
set @Rate = (select Rate from HumanResources.EmployeePayHistory
where BusinessEntityID = @EmployeeID
and RateChangeDate = (select max(RatechangeDate) from HumanResources.EmployeePayHistory
where BusinessEntityID = @EmployeeID
and RatechangeDate <= @AsOfDate)
)
return @Rate
end
go
And then I'll consume it in this query, to grab it for everyone for the as of date of 1-1-2005
select BusinessEntityID, LoginID, JobTitle,
dbo.UDFScalar_GetRateForAsOfDate('1-1-2005', BusinessEntityID) as RateForDate from HumanResources.Employee
where dbo.UDFScalar_GetRateForAsOfDate('1-1-2005', BusinessEntityID) is not null
OK, that works. And it runs in less than a second...but we're not talking about much data. And there is a CPU cost in dealing with the scalar function.
As it turns out, there's a technique that in many cases (maybe not all, but many) will run a bit faster.....and gives us the flexibility of returning multiple columns.
We can write a table function instead, to bring back multiple columns (not just the rate for the effective date, but the actual most recent rate change date on or before the effective date). Now, SQL Server doesn't have a "row-based" function that returns a single row, but we can bring back the equivalent of a table variable.
create function dbo.UDFTableValued_GetRateForAsOfDate
( @AsOfDate Date, @EmployeeID int)
returns Table
as
return
select Rate, RateChangeDate from HumanResources.EmployeePayHistory
where BusinessEntityID = @EmployeeID
and RateChangeDate = (select max(RatechangeDate) from HumanResources.EmployeePayHistory
where BusinessEntityID = @EmployeeID
and RatechangeDate <= @AsOfDate)
end
go
Now, we consume a table valued function from the outside a bit differently. We can't join the Employee master to it....but we can use a CROSS APPLY.
select BusinessEntityID, LoginID, JobTitle, TVF.Rate, TVF.RatechangeDate from HumanResources.Employee
CROSS APPLY dbo.UDFTableValued_GetRateForAsOfDate ('1-1-2005', BusinessEntityID) TVF
We don't need to filter out people with NULL rates, as the CROSS APPLY will eliminate them. (If we truly wanted to see all employees, even those without a rate for the date, we'd use OUTER APPLY instead)
This second approach, with the table function, runs in about 82 milliseconds and has no CPU cost. The first approach, using the scalar function, runs in about 109 ms and has a small CPU cost. So the table function approach is a bit faster, and you get the benefit of being able to bring back more than one column.
In more advanced "real life" applications, the differences in performance can be astounding. I've seen scalar functions that take minutes to run (when consumed over a large # of rows) whereas table functions take far less. As always, mileage can vary, so it's good to test out both to see.
I'm sure others can/will make suggestions on this example, it's simply offered as a starter example. Some people might be able to refactor the subquery and try out a SELECT TOP 1...ORDER BY instead of MAX. Sure, that's great, give it a try.
But generally, in SQL Server, be careful about using scalar UDFs over a large # of rows.
Adios...