This month T-SQL Tuesday is hosted by Mickey Steuwe (blog/@SQLMickey) and her topic of choice is “Data Modeling Gone Wrong.”
(If you don’t know about T-SQL Tuesday check out the information here – each month there is a new topic and it is a great excuse to write each month (and to start writing!) because someone offers a topic, so you already have taken the first step!).
My first memory of a data modeling problem comes to data types and the process of choosing appropriate data types - not because of storage concerns (although there can be problems) but because of actual performance risks.
When the data type of a field in a table doesn’t precisely match the datatype of your query (or parameter of your stored procedure) a conversion occurs. Conversions can be implicit (automatic) or explicit (requiring an inline CONVERT or CAST statement in the query).
Here is a chart from the Technet article on Data Type Conversions:
As you can see, a conversion from a CHAR to an NVARCHAR is “implicit” – so no big deal, right?
When I started as a DBA, we designed a system for payment processing for a university debit card system (an internal system, not a VISA/MC branded card at that time). Most of us that were involved were relatively new to the design process, so we decided to use NVARCHAR for all of our character fields. We decided it would be useful in a couple of ways:
- It was a standard – everything was the same (my 16-year-experienced self now knows that isn’t a great definition of “standard”)
- It allowed us to handle any eventual contingency – maybe we would need Unicode data, right?
At the beginning there wasn’t any visible impact – we only had a few thousand accounts and didn’t notice any issues.
Then we got to Fall Rush…
If you are familiar with American Universities, you know that campus bookstores make 80%-90% of their profit during the first week of each semester and often the week before. (This is probably true at many international universities as well.) One of the primary uses of our internal card was textbook purchasing, so we were slammed during this time period, both in terms of overall quantity of transactions and rapidity of transactions during the business day.
When we hit this period, we saw performance dip. At first we assumed it was just load (remember this was SQL Server 7.0 and then 2000) but we quickly realized that there was slag in our queries that wasn’t needed – we saw this in our query plan:
Image from https://pawankkmr.files.wordpress.com/2015/07/pawan-khowal-concatenation-operator-graphical-execution-plan.jpg
Why did we have a SCAN instead of a SEEK?
In a modern query plan this is visible in the predicate as a CONVERT_IMPLICIT:
Even in our inexperienced state we knew one of those key rules of query optimizations:
SCAN BAD! (In your best Frankenstein voice)
There are two ways to deal with an issue like this – you can fix the design of the table, or you can modify your code to perform an explicit CAST or CONVERT of the statement. The best fix (“best”) is to fix the design of the table – Aaron Bertrand (blog/@AaronBertrand) has a good list of suggestions here for fields that can often be set to a smaller data type.
If you can’t change the table design (as you very often can’t) the other option is to modify the code to prevent an implicit conversion. The first method is like this:
DECLARE @CardID as NVARCHAR(10)
SET @CardID = ‘123ABC456’
SELECT CardID, LastName, FirstName, Balance
WHERE CardID = @CardID
In this case the CardID field in the Credit table is of the NVARCHAR data type (as in my debit card scenario above). To get rid of the implicit conversion I am using a variable that is NVARCHAR to “pre-perform” the conversion so that by the time the query runs it is comparing apples to apples (or in this case NVARCHAR to NVARCHAR).
Another way to accomplish this is by using a CAST/CONVERT, like this:
SELECT CardID, LastName, FirstName, Balance
WHERE CardID = CAST(‘value’ as NVARCHAR)
This seems like a minor detail but it can be a very important one, especially in systems with lots of rows and/or high transaction rates.
Your homework - there is a great test harness for this problem created by Jonathan Kehayias (blog/@sqlpoolboy) in a blog post at SQLPerformance.com including charts that quantify the performance hit and CPU costs. Check it out!
I think it would be easier to make the comparison a Unicode comparison by prefixing the string with an N, e. g., WHERE CardID = N'123ABC456'. Of course every nvarchar string comparison would need to have this prefixing to avoid implicit conversion. Maybe it is something to reserve for where it will actually be needed. Thanks for the post. I have a coworker that told me he defines everything nvarchar, just to be prepared. I think this will help change his mind.ReplyDelete
I did a test on my own SQL Server 2012 system where the PK is 3 columns and I have almost 300K rows. I defined the key columns as nvarchar and then did retrievals using comparisons with and without implicit conversions on combinations of columns. In all cases an index seek was done, It was just a matter of whether it was doing the seek on the implicitly converted value or the unconverted values when they were flagged as Unicode. I expect there is something else going on, maybe with FK constraints. I know the problem of trying to simplify a complex system for publication.ReplyDelete