This is a continuation of Message #
1594531 in the Chatter/Religion section. The message used the word WHERE, and it reminded me of a SQL programming question people sometimes ask regarding WHERE.
For instance, in the MS AdventureWorks database, there is a Vendor Master table with 104 vendors, and a PurchaseOrderHeader table with POs for vendors.
Suppose we want a result set - one row for all 104 vendors, with the sum of order dollars in 2008 for each vendor. Some vendors did not have orders in 2008, but we still want the vendor name in the result set (with NULL or zero for the $ amount)
Not being aware of the issue I'm about to talk about, someone might possibly write the following query, using an OUTER JOIN between the Vendor table and the PO Header table to get all 104 vendors.
select Vend.Name, SUM(TotalDue) as VendorTotal
from Purchasing.Vendor as Vend
left outer join Purchasing.PurchaseOrderHeader AS POH
on Vend.BusinessEntityID = POH.VendorID
where OrderDate BETWEEN '1-1-2008' AND '12-31-2008'
group by Vend.Name
order by VendorTotal desc
The results only show 86 vendors (just the vendors who had orders in 2008). The numbers are correct for those 86 vendors, but we don't see the 18 vendors who didn't have orders.
But why??? We did a LEFT OUTER JOIN between the Vendor table and the PO Header table, so that should have given us all 104 vendors.
Here's the reason - the WHERE clause on the child table (POHeader) to filter the orders for 2008
is essentially negating the effect of the OUTER JOIN - giving us just the vendors who had orders in 2008.
So what can we do? Well, there are several answers. One might be to check the Order Date column being null, like so...
...where OrderDate BETWEEN '1-1-2008' AND '12-31-2008' OR OrderDate IS NULL
That will certainly work - however if there's an index on OrderDate, SQL Server won't be able to do an index seek - it will do an index scan, far less optimal.
An arguably better approach is to use some kind of subquery or common table expression to retrieve the orders for 2008 - and then do an OUTER JOIN from the Vendor Master to the results of the subquery/CTE...
;WITH POTemp as
( SELECT VendorID, SUM(TotalDue) as VendorTotal
FROM Purchasing.PurchaseOrderHeader
WHERE where OrderDate BETWEEN '1-1-2008' AND '12-31-2008'
GROUP BY VendorID)
select Vend.Name, VendorTotal
from Purchasing.Vendor as Vend
left outer join POTemp
on Vend.BusinessEntityID = POTemp.VendorID
order by VendorTotal desc
So remember the pattern - when using an OUTER JOIN between a master table and a child table, and then using a WHERE clause on the child table, you negate the effect of the OUTER JOIN ( for all intents and purposes, you turn it into an INNER JOIN)
So for all the "souls" who ever encounter this, I hope it rescues you from the fiery pits of slamming your fist on the desk, and lifts you into more heavenly pursuits (like kicking back with a good beer after the satisfaction of writing a successful query) :)
Peace....