Nebraska SQL from @DBA_ANDY: Disabling Non-Clustered Indexes For Fun and Profit

One of the frequently quoted ETL best practices is to disable your non-clustered indexes during a data load. Basically the conventional wisdom is that you disable the non-clustered indexes, do your load, and then enable the non-clustered indexes, which functionally rebuilds them with the newly loaded data (a step you should be performing after a load of any size even if you *don't* disable your indexes - rebuild those newly fragmented indexes!)

So why doesn't anybody seem to do this?

https://memegenerator.net/img/instances/53039992/ponder-we-must.jpg

IMPORTANT NOTE <cue scary music> - do NOT disable the clustered index when doing data loads or at any other time you want the table to be accessible. Disabling the clustered index makes the object (table or view) inaccessible.

If you disable the clustered index you are functionally disabling the object, and any attempts to access that object will result in an Error 8655:

Msg 8655, Level 16, State 1, Line 23
The query processor is unable to produce a plan because the index ‘PK_Person_BusinessEntityID’ on table or view ‘Person’ is disabled.

https://memegenerator.net/img/instances/62333367/no-please-dont-do-that.jpg

I recently had a situation with a client where they have a load process that altogether runs over 24 hours - not one day-long query but rather a string of processes that usually takes 24+ hours to complete. The situation has worsened to the point where other things are impacted because the process is taking so long, and they asked me to check it out.

I started with all of the basics, and right away Page Life Expectancy during the relevant period jumped out:

The server has 128GB of RAM...so a PLE consistently below 1,000 is pretty low.

I used XEvents to gather queries with large memory grants, and it brought to light an issue that the client has previously considered but not handled - disabling indexes during the load.

** SIDEBAR**

The code for the XEvents session to pull large memory grant queries I used is as follows - it collects any memory grant over 8MB (which you can modify in the WHERE clause) and its calling query:

CREATE EVENT SESSION [Ntirety_MemoryGrantUsage] ON SERVER
ADD EVENT sqlserver.query_memory_grant_usage
(
ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id
,sqlserver.database_name,sqlserver.server_principal_name,sqlserver.session_id,sqlserver.sql_text)

WHERE ([package0].[greater_than_uint64]([granted_memory_kb],(8192))))

ADD TARGET package0.event_file(SET filename=N'Ntirety_MemoryGrantUsage',max_file_size=(256)),

ADD TARGET package0.ring_buffer(SET max_events_limit=(0),max_memory=(1048576))

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON
)

The query to then retrieve the data is this - it dynamically pulls your default error log path and then queries the XEL output file from the event_file target in the session at that path:

USE master
GO

DECLARE @ErrorLogPath nvarchar(400), @XELPath nvarchar(500)

SET @ErrorLogPath = (
SELECT LEFT(cast(SERVERPROPERTY('ErrorLogFileName') as nvarchar(400)),LEN(cast(SERVERPROPERTY('ErrorLogFileName') as nvarchar(400)))-8)
)

SET @XELPath = @ErrorLogPath+'Ntirety_MemoryGrantUsage*.xel'

SELECT DISTINCT *
FROM
(
SELECT
DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_data.value(N'(event/@timestamp)[1]', N'datetime'))as Event_time
, CAST(n.value('(data[@name="granted_memory_kb"]/value)[1]', 'bigint')/1024.0 as DECIMAL(30,2)) AS granted_memory_mb
, CAST(n.value('(data[@name="used_memory_kb"]/value)[1]', 'bigint')/1024.0 as DECIMAL(30,2)) AS used_memory_mb
, n.value('(data[@name="usage_percent"]/value)[1]', 'int') AS usage_percent
, n.value ('(action[@name="database_name"]/value)[1]', 'nvarchar(50)') AS database_name
, n.value ('(action[@name="client_app_name"]/value)[1]','nvarchar(50)') AS client_app_name
, n.value ('(action[@name="client_hostname"]/value)[1]','nvarchar(50)') AS client_hostname
, n.value ('(action[@name="server_principal_name"]/value)[1]','nvarchar(500)') AS [server_principal_name]
, n.value('(@name)[1]', 'varchar(50)') AS event_type
, n.value ('(action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS sql_text
FROM
(
SELECT CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file(
@XELPath
, NULL
, NULL
, NULL)
)
AS Event_Data_Table
CROSS APPLY event_data.nodes('event') AS q(n)
) MemoryGrantUsage
ORDER BY event_time desc

** END SIDEBAR **

Looking at the queries that were the top memory consumers over the time period led to some pretty basic queries:

INSERT INTO "dbo"."Junk"
SELECT "UID"
, "ContainerId"
, "CategoryCode"
, "CategoryName"
, "CategoryDataValue"
, "CharacteristicCode"
, "CharacteristicName"
, "CharacteristicDataValue"
, "DimensionCode"
, "DimensionName"
, "DimensionDataValue"
, "OptionId"
, "NumberOfItems"
, "IsStandard"
, "LanguageId"
, "Market"
, "VehicleType"
, "YearMonth"
, "BatchRefreshDate"
, "CreatedBy"
, "CreatedTimestamp"
, "ModifiedBy"
, "ModifiedTimestamp"
, "BatchId"
FROM "dbo"."Junk_Stg"
where "Market" = 'XYZ'

https://media.makeameme.org/created/so-simple-right.jpg

The immediate problem to me was that memory grants usually come from SORT and HASH operations (check out Erik Darling's great description at https://www.erikdarlingdata.com/starting-sql/starting-sql-memory-grants-in-execution-plans/) and this query doesn't obviously do any of that - it is a very straightforward INSERT...SELECT from one table into another, but the query plan looks like this:

Pretty crazy right? With all of those SORT's, no wonder SQL Server wants to give this query a giant memory grant!

But why are there six different SORT operations on this one basic INSERT...SELECT?

Then I realized this is why:

Six SORT's...for SIX non-clustered indexes!

Inserting rows into a table with enabled non-clustered indexes requires a SORT to add rows to that index - the INSERT can't simply dump the rows on to the end of the index, but has to sort the input to match the indexes sort.

As a test I created a copy of the target table that I could tweak the indexes on, and a second copy without the indexes in place, and my INSERT...SELECT query plan against the No Index copy looked a little bit better:

As I test I copy-pasted the two INSERT...SELECT's into a single query window and pulled the plans for each so that I could compare them:

The Query Cost numbers in Management Studio always have to be taken with a grain of salt, but for comparison's sake consider - the six-way insert costs almost four times as much as the no-index query!

To test further I disabled three of the indexes on my copy table and ran the plan again:

Bingo - six streams now down to three for the three remaining enabled indexes.

I disabled the rest of the non-clustered indexes and:

https://memegenerator.net/img/instances/63334945/booyah.jpg

For one final comparison I ran the double query against my "Indexes All Newly Disabled" copy and my "No Index" copy together:

Spot on!

As I mentioned above the cost to this is that at the end of the process you have to have time to enable the indexes again, which does a rebuild of that index.

As always Your Mileage May Vary, as the direct impact will relate to the size of the table, the number of non-clustered indexes in place, and multiple other items - but check it out, especially if your process runs longer than you want - this may be the problem!

Hope this helps!

Tuesday, March 2, 2021

Disabling Non-Clustered Indexes For Fun and Profit

1 comment: