(If you don’t know about T-SQL Tuesday check out the
information here
– each month there is a new topic and it is a great excuse to write each month (and to start writing!) because someone offers a topic, so you already have taken the first step!).
My first memory of a data modeling problem comes to data
types and the process of choosing appropriate data types - not because of storage concerns (although there can be problems) but because of actual performance risks.
When the data type of a field in a table doesn’t precisely
match the datatype of your query (or parameter of your stored procedure) a
conversion occurs. Conversions can be
implicit (automatic) or explicit (requiring an inline CONVERT or CAST statement
in the query).
As you can see, a conversion from a CHAR to an NVARCHAR is “implicit”
– so no big deal, right?
WRONG!
When I started as a DBA, we designed a system for payment
processing for a university debit card system (an internal system, not a
VISA/MC branded card at that time). Most
of us that were involved were relatively new to the design process, so we
decided to use NVARCHAR for all of our character fields. We decided it would be useful in a couple of
ways:
- It was a standard – everything was the same (my 16-year-experienced self now knows that isn’t a great definition of “standard”)
- It allowed us to handle any eventual contingency – maybe we would need Unicode data, right?
At the beginning there wasn’t any visible impact – we only had
a few thousand accounts and didn’t notice any issues.
Then we got to Fall Rush…
If you are familiar with American Universities, you know that
campus bookstores make 80%-90% of their profit during the first week of each
semester and often the week before. (This
is probably true at many international universities as well.) One of the
primary uses of our internal card was textbook purchasing, so we were slammed
during this time period, both in terms of overall quantity of transactions and
rapidity of transactions during the business day.
When we hit this period, we saw performance dip. At first we
assumed it was just load (remember this was SQL Server 7.0 and then 2000) but
we quickly realized that there was slag in our queries that wasn’t needed – we saw
this in our query plan:
Why did we have a SCAN instead of a SEEK?
In a modern query plan this is visible in the predicate as a
CONVERT_IMPLICIT:
Even in our inexperienced state we knew one of those key
rules of query optimizations:
There are two ways to deal with an issue like this – you can
fix the design of the table, or you can modify your code to perform an explicit CAST or
CONVERT of the statement. The best fix (“best”)
is to fix the design of the table – Aaron Bertrand (blog/@AaronBertrand) has a good list of
suggestions
here for fields that can often be set to a smaller data type.
If you can’t change the table design (as you very often can’t)
the other option is to modify the code to prevent an implicit conversion. The first method is like this:
DECLARE @CardID as NVARCHAR(10)
SET @CardID = ‘123ABC456’
SELECT CardID, LastName, FirstName,
Balance
FROM Credit
WHERE CardID = @CardID
In this case the CardID field in the Credit table is of the
NVARCHAR data type (as in my debit card scenario above). To get rid of the implicit conversion I am
using a variable that is NVARCHAR to “pre-perform” the conversion so that by
the time the query runs it is comparing apples to apples (or in this case
NVARCHAR to NVARCHAR).
Another way to accomplish this is by using a CAST/CONVERT,
like this:
SELECT CardID, LastName, FirstName,
Balance
FROM Credit
WHERE CardID = CAST(‘value’ as
NVARCHAR)
--
This seems like a minor detail but it can be a very
important one, especially in systems with lots of rows and/or high transaction
rates.
--