One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Thursday, November 29, 2018

Things I Learned at Summit v20 - Trace Flag 4199

Part of a series of posts on cool stuff I learned at PASS Summit v20 (2018) - in this first post we'll look at a trace flag I had never even heard of, let alone deployed - 4199.

https://i.imgflip.com/15frvt.jpg
What is a Trace Flag?  According to Microsoft:
Trace flags are used to temporarily set specific server characteristics or to switch off a particular behavior. 
At the most basic, you can set or unset a trace flag in order to modify how SQL Server works.  This can be done globally (instance-wide) or for a particular session.

There are a LOT of trace flags, and several are widely used, such as 1204 and 1222 (Deadlock Info) and 1118 (Disable Single Page Allocations), but I haven't heard a Microsoft employee tell me I should "almost always" turn any trace flag on - until now.

--

On Tuesday of Summit week, I was in the pre-conference session put on by the Microsoft MSSQL Tiger Team - the Senior Product Managers of SQL Server Engineering (blog/@mssqltiger).

https://fico.i.lithium.com/t5/image/serverpage/image-id/31298i0D30CC780E2784AD?v=1.0
The session was titled "Modernize Your SQL Server with Bob Ward, the Tiger Team, and CSS Escalation Engineers" and it...was...awesome!

One of the presenters was Pedro Lopes (blog/@SQLPedro), a Senior PM for the Relational Engine.  In his part of the day he talked about several features of the engine and the optimizer, but the "What...what did he say?" moment for me was when he talked about trace flag 4199...and how we should have it turned on pretty much everywhere.

Wait...what?

https://vignette.wikia.nocookie.net/halofanon/images/d/de/BEAN-say-what.jpg/revision/latest?cb=20130302200428

Trace Flag 4199 enables query optimizer fixes that are included in CUs (and the CU breakpoint releases formerly known as "service packs") after RTM.

What does this mean?  If you aren't running with TF 4199, then your Query Optimizer (QO) is running the same as RTM, regardless of what CU level you have applied to your engine.

For example, if you are running SQL Server 2012 SP4 (11.0.7001.0) but you aren't running with TF 4199, then your QO is running the same as someone whose instance is running 2012 RTM (11.0.2100.60).  There are plenty of other things changed by applying the CU, but Query Optimizer function isn't one of them...without 4199.

That melted my mind a little, and I could tell the same was true for the attendees sitting near me - we can religiously apply CUs and fixes but unless we're running this TF (which none of us had ever even *heard* of) then you aren't fixing (improving) your Query Optimizer.

https://i.chzbgr.com/full/3519512832/h0B54C8A3/

Query Optimizer fixes are shipped as OFF by default in order to maintain compatibility and performance when CUs are deployed (so there aren't plan regressions).  Originally there was a new trace flag for each fix, so to turn on multiple fixes you would have to enable multiple trace flags.

As described at https://support.microsoft.com/en-us/help/974006/sql-server-query-optimizer-hotfix-trace-flag-4199-servicing-model, starting way back in SQL2005-SP3-CU6, SQL2008-SP1-CU7, and SQL2008R2 RTM, the ability to enable query optimizer fixes was rolled into one switch, TF 4199. 

While reading the article I tripped over one sentence:
Because trace flag 4199 is recommended only for customers who are seeing specific performance issues, customers are advised to remove trace flag 4199 after they migrate their databases to the latest compatibility level because trace flag 4199 will be reused for future fixes that may not apply to your application and could cause unexpected plan performance changes on a production system
This was interesting to see, because it ran contrary to the advice from this session and several other places I found references to enabling 4199, but this sentence reads like standard lawyerly boilerplate which is enough to make me treat it with healthy distrust.

(For this blog post I took a step I never have for a blog post before - I reached out to Pedro directly to verify my interpretation of his message to make sure I wasn't misrepresenting his content.  He graciously responded with a few comments, including the fact that he is trying to get the above highlighted sentence updated to reflect changes in how TF 4199 works under SQL 2016+ (see below)).

--

At the pre-conference session, Pedro's comments were basically that while there was a time when we all "waited for SP1" to let everyone else test, that isn't how the modern SQL Server development model works.  With the extensive testing that occurs these days as well as the "Cloud First" model that has everything live in production in Azure long before it gets to the on-premises product, there simply isn't that much risk to deploying RTM software or to deploying CUs as they come out.

IMPORTANT NOTE - There is still always (*always*) due diligence - patch DEV before PROD (we all do *that* at least, right?) - but by the time we get a new CU for an on-prem SQL Server it has already been used by thousands (hundreds of thousands) of users in Microsoft testing and on Azure.

In effect, we are already waiting for other people to test it, even without waiting at all.

https://ih0.redbubble.net/image.400523286.2984/flat,550x550,075,f.u3.jpg

While researching this topic I found another well-put opinion in one of the forum answers here by SQL MVP Gianluca Sartori (blog/@spaghettidba):

--
Personally, whenever I build a new server for a new project I always enable TF4199 globally. The same applies when I upgrade existing instances to newer versions.
The TF enables new fixes that would affect the behaviour of the application, but for new projects the risk of regression is not an issue. For instances upgraded from previous versions, the differences between old and new version are a concern on their own and having to deal with plan regression is expected anyway, so I prefer fighting with it with TF4199 enabled. <emphasis mine>
AS far as existing databases is concerned, there is only one way to know: test it. You can capture a workload on the existing setup and replay it after enabling the flag. RML Utilities can help you automate the process, as described in this answer.
Obviously, the flag affects the whole instance, so you'll have to test all the databases sitting there.
--

Another piece to the TF 4199 story is how it relates to database compatibility level.  Prior to SQL 2016, 4199 was an all or nothing - if you were running SQL 2014 SP1 with 4199 enabled, you got the Query Optimizer fixes included in 2014 SP1, regardless of the DB Compat level.

Starting in SQL 2016, DB Compat Level is relevant with 4199 enabled:

https://sqlbits.com/Downloads/550/SQLBits%20-%20Enhancements%20that%20will%20make%20your%20SQL%20database%20engine%20Part%202.pdf

On a SQL Server 2016 server, running in 130 compatibility automatically enables fixes from previous compatibility levels, even without running 4199; the purpose of 4199 in SQL 2016 is to enables fixes *after 2016 RTM* for databases in compatibility 130 - fixes before 2016 RTM are already enabled in compatibility 130.

This model is continued in 2017+ as described here.

This is consistent with another piece of info I learned at Summit v20 from the Tiger Team - start certifying for Compatibility Level, not SQL Server version.  When developing and testing, verify your code for DBCompat 130, *not* SQL Server 2016. (I may write more about that later.)

--

One other item Pedro highlighted in his email response to me was that there are multiple trace flags that Microsoft recommends for various systems, especially high-traffic/high-performance systems.  These still aren't "turn on everywhere" but are frequently useful.  

They are laid out in a pair of MS support articles:
Recommended updates and configuration options for SQL Server 2012 and SQL Server 2014 with high-performance workloads
Recommended updates and configuration options for SQL Server 2017 and 2016 with high-performance workloads
Many of these items and other related high-performance configurations are detailed at:
http://aka.ms/sqlperfcenter
Thanks Pedro for your response - and thanks Tiger Team for your amazing pre-con!

https://sayingimages.com/wp-content/uploads/dont-try-to-deny-it-awesome-meme.jpg

--

Hope this helps!