One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Tuesday, October 4, 2011

T-SQL Tuesday #23 - Respect your JOIN's


It's that time again....         



Time for the SQL Server blogosphere's...recurring, revolving blog party - T-SQL Tuesday!  This month is hosted by Stuart Ainsworth (B/T) and his chosen topic is JOIN's.

When I think about JOIN's the first thing that comes to mind is always our internal classic example of how our vendor's T-SQL all-to-often arrives:

SELECT * from * JOIN * WHERE * = *

(At least it sure seems that way...)

Of course the statement isn't anything approaching valid code but it demonstrates the underlying point - without appropriate field lists, table lists, and filters (WHERE/ON clauses) your database can not be held responsible for what it returns to you...

What you ASK the database for is what you will get - even if it *isn't* what you really WANT.

For example:

SELECT Sales.CustID, Customers.CustomerName, SUM(Sales.Amount) as Total_Sales
FROM Customers
JOIN Sales
ON Customers.CustomerNumber = Sales.CustID
WHERE Sales.SalesDate BETWEEN '01/01/2011' and '01/02/2011'
GROUP BY Sales.CustID, Customers.CustomerName

Seems innocuous enough right?

What if CustomerNumber in the Customers table is a natural key (001-34F) and the CustID column in Sales is an identity integer?  Even if T-SQL is insidious enough to do the implicit conversion for you (which depending on the version it may or may not do), you aren't comparing apples to apples and if you do get any rows back they will almost certainly be faulty.  For recent versions of SQL you will usually get an error like:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value '001-34F' to data type int.

Then there's the more extreme and dangerous example:

SELECT Sales.CustID, Customers.CustomerName, SUM(Sales.Amount) as Total_Sales
FROM Customers
JOIN Sales
ON Customers.CustID = Sales.SalesID
WHERE Sales.SalesDate BETWEEN '01/01/2011' and '01/02/2011'
GROUP BY Sales.CustID, Customers.CustomerName

Now what do you get?

CustID                     CustomerName            Total_Sales
1                               MoreStuff                     10.66
1                               StuffCo                         23.90
2                               StuffIt                           23.00

Cut to the CIO reviewing his morning reports: "Sales are a little down yesterday...hmm, I wonder why MoreStuff and StuffCo have the same Customer ID...wait a minute..."

Just because a query returns results does not mean they are valid, and JOIN's (and their minions WHERE and ON) have more potential to screw up your results than anything else if you don't treat them with respect and know your data - always use the appropriate JOIN (inner/left/right/etc.) for the data you are looking for and always make sure to filter with the appropriate join criteria - compare apples to apples.

I know this really seems like T-SQL 101 stuff, but it never ceases to amaze me how many stored procedures/views/SSRS reports I open up and find the underlying problem to be related to a bad join or faulty join criteria - so apparently it is worth mentioning!  {-:




No comments:

Post a Comment