One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Thursday, March 9, 2017

Embrace The Missing Index DMVs - But Proceed with Caution!

One of the performance tools I use all of the time is the set of Missing Index DMVs: 
·         sys.dm_db_missing_index_details – Detailed specifics on the missing indexes, including column lists 
·         sys.dm_db_missing_index_groups – relates individual missing indexes to index groups
·         sys.dm_db_missing_index_group_stats – information on potential cost and benefit of the missing indexes
·         sys.dm_db_missing_index_columns – (not regularly used but included for completeness) – included information on individual columns in the missing indexes – this information is readily retrieved from sys.dm_db_missing_index_details as groups of columns 
As I always tell you, the easiest way to start is to borrow from someone else.  The most commonly used query is from Glenn Berry’s (blog/@GlennAlanBerry) Diagnostic Information Queries.  As of the February 2017 release it is Query #31:

-- Missing Indexes for all databases by Index Advantage  (Query 31) (Missing Indexes All Databases)

SELECT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS index_advantage,
migs.last_user_seek, mid.statement AS Database.Schema.Table,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)
INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)
ON mig.index_handle = mid.index_handle
ORDER BY index_advantage DESC OPTION (RECOMPILE);
------
-- Getting missing index information for all of the databases on the instance is very useful
-- Look at last user seek time, number of user seeks to help determine source and importance
-- Also look at avg_user_impact and avg_total_user_cost to help determine importance
-- SQL Server is overly eager to add included columns, so beware
-- Do not just blindly add indexes that show up from this query!!!

The results look like this:

Index
advantage
last_user
seek
Database.Schema.
Table
Equality
columns
Inequality
columns
Included
columns
Unique
compiles
User
seeks
avg_total
user_cost
avg_user
impact
219108325.6
02/16/2017 22:28:13
database1.schema2.table1
employee_id
NULL
id, as_of_effective_date
5
36411704
6.07278699
99.09
9319171.13
02/17/2017 10:09:54
database4.dbo.table3
cmpcode, year_max, period_max
NULL
rundatetime
581
3482
2679.33185
99.89
7881068.93
02/17/2017 10:05:19
database2.dbo.table99
code
grpcode
cmpcode
341
3451
2285.305576
99.93
7037526.5
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
NULL
rundatetime
453
2588
2720.924091
99.94
5861313.35
02/17/2017 10:05:19
database2.dbo.table99
NULL
grpcode
cmpcode, code
341
3451
2285.305576
74.32
3440880.84
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
rundatetime
NULL
227
1294
2661.233188
99.92
3362884.24
02/17/2017 10:05:43
database2.dbo.table12
elmlevel, deldate
NULL
cmpcode, code, name, sname
278
629
5353.357209
99.87
1646616.36
02/17/2017 10:10:39
database7.schema33.table2
NULL
doc_status
cmpcode, doccode, docnum
724
234099
10.3851265
67.73
1592595.34
02/17/2017 10:09:51
database2.dbo.table99
code
NULL
cmpcode, name, sname
140
310
5137.918128
99.99
877818.52
02/16/2017 15:57:09
database2.dbo.table99
elmlevel, grpcode
NULL
cmpcode, code
184
378
3185.120325
72.91

What this tells me is that the potentially (**potentially**) most useful index is on schema2.table1 in database1 on the employee_id column, with the id and as_of_effective_date columns along for the ride as INCLUDEs.  Since the MSSQLServer service was last restarted, the index would have been compiled 5 times (low cost) and would have been used a whopping 36 million times (huge benefit)!

…but wait!

At this point we all need to pause and consider the collective wisdom...

https://cdn.meme.am/instances/44649369.jpg
You can see that Glenn warns us in the last comment of his script “Do not just blindly add indexes that show up from this query!!!”

One of the most common reasons people quote for this is the impact such an index can have.  It is possible that adding an index can cause other queries to create/choose a query plan that is less favorable than its current plan because of the index of the new index.  Maybe Query1 was using Plan1 and running smoothly but now that there is a new index it may start using Plan2 which take milliseconds longer but is at a lower “cost.”  (Yes, milliseconds definitely matter!)

This is very uncommon but it can happen.  Always test missing indexes in a DEV/TEST environment before you roll them out into production!

Because you all have DEV/TEST environments for every single PROD environment that matches the hardware/software specs of PROD, right?

Right?  

RIGHT?
http://24.media.tumblr.com/tumblr_m8lvn0pSSH1qbsjydo1_500.jpg
Well….test if you can – I would never recommend “test in PROD” from an academic sense, but we all know in the real world we often have no choice – which is just another reason to be even more cautious of blindly adding new indexes – “missing” or otherwise.

--

One of the top reasons I say to be cautious of the Missing Index DMVs that I want to discuss has to do with duplicative suggestions.

Let’s look at the results from above again:

Index
advantage
last_user
seek
Database.Schema.
Table
Equality
columns
Inequality
columns
Included
columns
Unique
compiles
User
seeks
avg_total
user_cost
avg_user
impact
219108325.6
02/16/2017 22:28:13
database1.schema2.table1
employee_id
NULL
id, as_of_effective_date
5
36411704
6.07278699
99.09
9319171.13
02/17/2017 10:09:54
database4.dbo.table3
cmpcode, year_max, period_max
NULL
rundatetime
581
3482
2679.33185
99.89
7881068.93
02/17/2017 10:05:19
database2.dbo.table99
code
grpcode
cmpcode
341
3451
2285.305576
99.93
7037526.5
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
NULL
rundatetime
453
2588
2720.924091
99.94
5861313.35
02/17/2017 10:05:19
database2.dbo.table99
NULL
grpcode
cmpcode, code
341
3451
2285.305576
74.32
3440880.84
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
rundatetime
NULL
227
1294
2661.233188
99.92
3362884.24
02/17/2017 10:05:43
database2.dbo.table12
elmlevel, deldate
NULL
cmpcode, code, name, sname
278
629
5353.357209
99.87
1646616.36
02/17/2017 10:10:39
database7.schema33.table2
NULL
doc_status
cmpcode, doccode, docnum
724
234099
10.3851265
67.73
1592595.34
02/17/2017 10:09:51
database2.dbo.table99
code
NULL
cmpcode, name, sname
140
310
5137.918128
99.99
877818.52
02/16/2017 15:57:09
database2.dbo.table99
elmlevel, grpcode
NULL
cmpcode, code
184
378
3185.120325
72.91

Highlighted rows 4 and 6 are an example of what I call duplicative recommendations.  The CREATE INDEX statement for the two recommendations (generated using Bart Duncan’s Missing Index script) shows this even more clearly:

CREATE INDEX missing_index_2044_2043_table3 ON database4.dbo.table3 (cmpcode, usercode) INCLUDE (rundatetime)

CREATE INDEX missing_index_2046_2045_table3 ON database4.dbo.table3 (cmpcode, usercode,rundatetime)

The first index is only two columns with an INCLUDE of a third column, while the second index is only all three columns.  The second index will not only satisfy any situations needing that index, but will also satisfy any situations needing the first index.

Note that the Index Advantage (weighted average of cost and benefit) of the second index, the index we really want, is only half that of the first index.  When I report recommendations like this to the client I edit the output to match the highest Index Advantage of the duplicative indexes with the most correct recommendation – in this case I would use the second index definition (the index on all three columns with no INCLUDE) with the first Index Advantage (7037526.5).

--

Another situation similar to that of the duplicative recommendation is that of the “left-hand-equivalent” recommendation.  Consider the two highlighted rows here:

Index
advantage
last_user
seek
Database.Schema.
Table
Equality
columns
Inequality
columns
Included
columns
Unique
compiles
User
seeks
avg_total
user_cost
avg_user
impact
219108325.6
02/16/2017 22:28:13
database1.schema2.table1
employee_id
NULL
id, as_of_effective_date
5
36411704
6.07278699
99.09
9319171.13
02/17/2017 10:09:54
database4.dbo.table3
cmpcode, year_max, period_max
NULL
rundatetime
581
3482
2679.33185
99.89
7881068.93
02/17/2017 10:05:19
database2.dbo.table99
code
grpcode
cmpcode
341
3451
2285.305576
99.93
7037526.5
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
NULL
rundatetime
453
2588
2720.924091
99.94
5861313.35
02/17/2017 10:05:19
database2.dbo.table99
NULL
grpcode
cmpcode, code
341
3451
2285.305576
74.32
3440880.84
02/17/2017 10:09:39
database4.dbo.table3
cmpcode, usercode
rundatetime
NULL
227
1294
2661.233188
99.92
3362884.24
02/17/2017 10:05:43
database2.dbo.table12
elmlevel, deldate
NULL
cmpcode, code, name, sname
278
629
5353.357209
99.87
1646616.36
02/17/2017 10:10:39
database7.schema33.table2
NULL
doc_status
cmpcode, doccode, docnum
724
234099
10.3851265
67.73
1592595.34
02/17/2017 10:09:51
database2.dbo.table99
code
NULL
cmpcode, name, sname
140
310
5137.918128
99.99
877818.52
02/16/2017 15:57:09
database2.dbo.table99
elmlevel, grpcode
NULL
cmpcode, code
184
378
3185.120325
72.91

As above, here are the scripted CREATE INDEX statements for those two rows:

CREATE INDEX missing_index_35_34_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode)

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code) INCLUDE (cmpcode, name, sname)

These two indexes are not as obviously related but they are.
http://i.imgur.com/iQYuWno.jpg
They are not only on different fields, but also have different INCLUDE columns.  If you look closely though, the actual index columns are what I call “left-hand equivalent” – they both start with code and then the first index adds grpcode, so an index on code, grpcode would cover both situations for the searchable index fields.

The second piece that would truly make an index cover both situations is for it to include the sum of the INCLUDE’d columns – hence:

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode, name, sname)

This index on two columns with three included columns covers both situations – instead of choosing one index over the other we need to do a little work and combine them, but the effect can be very beneficial, and once you understand how it works it doesn’t take that much time.

--

Here is another (completely contrived) situation:

CREATE INDEX missing_index_44_45_table23 ON database3.dbo.table23 (name,address1) INCLUDE (address2)

CREATE INDEX missing_index_32_33_table23 ON database3.dbo.table23 (name) INCLUDE (address1, address2, state)

CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)

CREATE INDEX missing_index_48_49_table23 ON database3.dbo.table23 (name, address1,city)

So we need to start at the beginning – are the indexes all on the same database and table?  Check!  (You may chuckle but especially when looking across an instance you may find you have very similar looking databases/tables!)

Next, let’s look at left-hand equivalence.  All four indexes start with name field – so far so good.  Index_32_33 ends there, so it is a likely candidate to be consolidated with something else.

This is where it gets a little trickier – both index_44_45 and index_48_49 have address1 as their second column, which means they could be duplicative and could also be related (left-hand-equivalent) to index_32_33 upon further investigation.

Index_55_56 however does not continue with address1 – instead it has city in its second position.  This means index_55_56 is *not* duplicative of index_44_45 or index_48_49 although it can still be related to the narrowest index, index _32_33.

This demonstrates again how important the order in the index is – indexes are searched from left-to-right, so "name, city" <> "name, address".

Consider index_55_56 and index_48_49 – even though index_48_49 *does* have the city column in its index list, it is not in the same-left-to-right order (with address1 in the way) so it isn’t left-hand-equivalent and therefore not combinable.

This leaves us with two options, either of which can be optimal: 
Combine index_32_33, index_44_45, and index_48_49, and just create index_55_56 as is: 
CREATE INDEX missing_index_98_99_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2,state) 
CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)
 Combine index_44_45 with index_48_49 (both containing name,address1) and index_32_33 and index_55_56: 
CREATE INDEX missing_index_77_78_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2) 
CREATE INDEX missing_index_88_89_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1, address2, state) 
As stated above either of these options work – they both cover all four situations.  One thing to consider is the size of the fields contained in the indexes – in option 1 we are storing eight fields (name twice, address1 twice, city twice, address2 once, and state once) whereas in option 2 we are storing *nine* fields as we have address2 in the INCLUDE of both indexes.  This may make Option 1 at least slightly “better” although depending on the datatype of address2 and the number of rows in table23, that advantage may be negligible.

--

Missing indexes are an oft-avoided subject but they really can make a difference to performance, and the algorithms inside SQL Server to help determine and weight the recommendations has become much better with each version of SQL Server.  One thing these improved algorithms still don’t watch for are the duplicative/related situations we have discussed here, so you still need to watch for them yourselves.

http://www.cindyvallar.com/crowsnest.jpg

Again, do *not* just blindly create new indexes – consider, test if possible, and weigh the advantages against the possible disadvantages such as the amount of space the index will consume.

--

Hope this helps!