One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Monday, July 14, 2014

End of Mainstream Support for Microsoft SQL Server 2008 and 2008 R2



@DBA_ANDY: A reminder to all #sqlserver peeps out there and *especially* to our clients: #sql2008 #sql20008R2 http://blogs.msdn.com/b/sqlreleaseservices/archive/2014/07/09/end-of-mainstream-support-for-sql-server-2008-and-sql-server-2008-r2.aspx #Ntirety


--

What does this mean?

The key takeaway comes from this section of the MSDN blog:

For both SQL Server 2008 and SQL Server 2008 R2, Microsoft will continue to provide technical support which also includes security updates during the duration of extended support.  See the table below for extended support end date.  Non-security hotfixes for these versions will be offered only to customers who have an Extended Hotfix Support agreement.  Please refer to Extended Hotfix Support – Microsoft for more information.

Microsoft uses a standard cycle of mainstream support/extended support for almost all of their products, as described at http://support.microsoft.com/lifecycle/default.aspx?LN=en-us&x=14&y=6

The biggest difference between mainstream and extended support is that during the extended support period the only true support that occurs for the product is security fixes (which very rarely occur for Microsoft SQL Server) and paid support (aka 1-900-Microsoft).  There are no service packs/cumulative updates/functionality changes/online support for a product once it enters extended support.  If you pay for an Extended Support contract (I can only think of one shop I have worked with that does pay for it) then you get some added support beyond security patches, but not much.

What do we need to do?

As described in the MSDN blog, you should plan to get off Microsoft SQL Server 2008 and 2008 R2 ASAP.  Generally Microsoft keeps a product in mainstream support until the second version after becomes GA (Generally Available).  In the case, now that SQL Server 2012 *and* 2014 are out, 2008/2008 R2 are now “minus two” versions and therefore have gone out of mainstream support.

In many cases it is possible to piggy back on top of a hardware refresh project or virtualization project to try to remove as much older SQL Server as you can.

I still run lots of even older versions – SQL Server 7.0/2000/2005 – why should I care?

Many people don’t care, but I have found it is a good general IT policy is to run everything – hardware/software/etc. – within vendor-supported timelines.  Not to make the vendors $$$, but for the supportability of the product *and* for the functionality/performance of the product.

Sure, your SQL Server 2005 on Windows Server 2003 runs your database (kind of) but what would you do if the server blew its motherboard – do you have a spare eight-year-old board in stock?  What about the software (Windows/SQL Server/etc.) – each new version adds increased performance options (compression, availability groups, etc.) and improved manageability (new dynamic management views (DMV’s), Extended Events, improved tools, etc.) – what could your staff be doing if they didn’t need to continue to hand-hold those old servers?

My application vendor doesn’t support SQL Server 2012/2014 yet!

This is one of the most cited catches to advancing software, especially database platforms.  In most cases it is possible to upgrade/replace your existing server with a new current version Windows Server/SQL Server and run your “unsupported” application databases under a down-level compatibility level (for example, running the database under SQL Server 20008 compatibility (100) on a SQL server 2012 (110) server).  This option needs to be tested before it is implemented.

Where am I going to get the $$$$ to do this?

I can’t help you much there other than to refer back to my earlier point about the old server and its eight-year-old motherboard – quantify the cost to your management of the system going down, possibly for days on end while you search eBay and Craigslist for replacement hardware
 (this isn’t a joke – I worked in one shop that had to go through this). 


This is where many DBA's (and DBA Managers) get hung up in the idea of SQL Server backups.

**IMPORTANT NOTE - you absolutely need to have SQL Server backups of all of your systems with only limited exceptions – even DEV and TEST since they are functionally PROD for your developers and QA staff**

Having said that, a wonderful stack of SQL Server backups shipped securely to your offsite facility doesn’t save you from the failed eight-year-old motherboard scenario – best case you could spin up a VMware/HyperV/etc. virtual server and restore the databases there, but do you still have all of the necessary Windows/SQL Server/service pack installation media to even install Windows 2003 SP2 and SQL Server 2005 SP4CU3 patched with MS12-070?


--

I wanted to put this out there because I know that for many of you SQL Server 2008 and 2008 R2 are your base versions – there are some SQL Server 2012’s out there (and more than a few 2005’s and 2000’s) but most servers I deal with day to day are 2008/2008 R2.

Pause and consider what I have said and work together with your team and your management in the coming months to get this done.
 

Thursday, June 26, 2014

Why Is your datetime saved as a bigint anyway?

Yet another issue I had never seen before - thankfully this time I was able to find a relatively quick answer.

On a client system their primary database had grown to throw disk alarms on the DATA drive, so I was alerted.  I signed onto the system and found that the largest table in the 43GB database was itself 20GB.  I looked at the contents of the table and found several fields labelled sent_time and audit_time...only to find that they were bigints rather than datetime - yuck!

The best description of the situation I found was a post from Ben Northway (Blog/@northben) titled "SQL Server date time conversions to/from Unix bigint format UPDATED".  Ben describes how the bigint datetime I found is actually a UNIX-style timestamp, counting the milliseconds since January 1, 1970. (makes sense, doesn't it?)  It even has a catchy name - Epoch time.

Why January 1, 1970?  Here's one response I found on a StackOverflow post asking this same question:

The universe was created on Jan 1, 1970. Anyone who tells you otherwise is clearly lying. –  ijw Jul 7 '09 at 23:52
...and now you know.

Ben's formula worked perfectly for me:

SELECT DATEADD(s,mycolumn/1000,'19700101') from mytable
As you can see, rather than performing match in milliseconds, the formula divides by 1000 in order to do math in seconds - as Ben notes this prevents an arithmetic overflow error.

A couple of limitations of this approach:
  • Epoch Time ignores leap seconds, so if you need that level of precision you will need a much much more complicated formula than this.
  • This formula is based on UTC/GMT, so if you need to convert to a specific time zone you need to add in a correcting factor like this:
SELECT DATEADD(s,mycolumn/1000+8*60*60,'19700101') from mytable
This corrects by 8 hours (8 hours * 60 mins/hour * 60 secs/min) to UTC/GMT+8.

Thanks Ben - hope this helps - I know it helped me!

Thursday, May 29, 2014

Handling Perfmon Logs with Relog


In a recent situation, I was tasked by a client to check one of their servers which they believed to be grossly over-provisioned.  As a VM they knew they could easily pull both CPU and RAM from the server if they could determine the true needs of the server.

One of the items I regularly recommend to my clients is to have a "permanent" Perfmon collector log, where a basic set of counters is collected every five minutes, with the log file cycled every day.  This generates 1.5MB-2.0MB of files daily (a trivial amount of storage even if maintained for several years) and daily cycling makes the logs easier to handle when troubleshooting issues.  This is also useful for something most people don't have - a baseline over time - but to use it this way requires some serious finagling as you will see.

--

The way I set up the collector by default, the collector writes to a file in a designated directory, and I use a couple of batch files that use the logman command to stop and start the collector, and then a pair of Windows Scheduled Tasks, one to stop and start the collector each night (effectively cycling to a new output file) and one to start the collector on server start-up (so that a server reboot doesn't kill the collector altogether).  I know there are ways to configure a collector to cycle to new files each day, but they are version-specific based on your version of Windows, while the batch file/scheduled task method works across O/S versions.

--

The net effect of this method is that you end up with a directory with a large stack of blg files like this:


You can see a file for each day plus a new file any time the Server restarts.  This is great when you are looking for info of what happened on the night of 05/24/2014, but not so much when you need to know what happened over the last thirty days.

Enter relog!

Relog is a command line utility for manipulating perfmon output files (either .blg or .csv).  It is frequently used to filter existing files by counter or to convert .blg files into .csv files, both of which are described in this blog post and this one by John Pertell (blog/@jayape).  A lesser known use of relog is its ability to combine individual files into a single larger file, like this:

relog day1.blg day2.blg day3.blg -f BIN -o Day1_2_3.blg

...which would combine the three files day1, day2, and day3 into a single large file Day_1_2_3.

This is a necessary evil for many types of analysis since it can be difficult to wade through multiple perfmon log files at once.  Most tools claim to support it, but most of them also have vague disclaimers that amount to YMMV.  The most reliable way to handle this situation is to use relog to combine the files prior to beginning the analysis.


In the given situation, I had 43 files covering a little more than a month to analyze.  I hadn't collected this many files at once before, but didn't really think about it as any big deal.  I used notepad to mock together the relog command (not for the faint of heart):


C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000066.blg Ntirety-SQLServerPerf_000067.blg Ntirety-SQLServerPerf_000068.blg Ntirety-SQLServerPerf_000069.blg Ntirety-SQLServerPerf_000070.blg Ntirety-SQLServerPerf_000071.blg Ntirety-SQLServerPerf_000072.blg Ntirety-SQLServerPerf_000073.blg Ntirety-SQLServerPerf_000074.blg Ntirety-SQLServerPerf_000075.blg Ntirety-SQLServerPerf_000076.blg Ntirety-SQLServerPerf_000077.blg Ntirety-SQLServerPerf_000078.blg Ntirety-SQLServerPerf_000079.blg Ntirety-SQLServerPerf_000080.blg Ntirety-SQLServerPerf_000081.blg Ntirety-SQLServerPerf_000082.blg Ntirety-SQLServerPerf_000083.blg Ntirety-SQLServerPerf_000084.blg Ntirety-SQLServerPerf_000085.blg Ntirety-SQLServerPerf_000086.blg Ntirety-SQLServerPerf_000087.blg Ntirety-SQLServerPerf_000088.blg Ntirety-SQLServerPerf_000089.blg Ntirety-SQLServerPerf_000090.blg Ntirety-SQLServerPerf_000091.blg Ntirety-SQLServerPerf_000092.blg Ntirety-SQLServerPerf_000093.blg Ntirety-SQLServerPerf_000094.blg Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg Ntirety-SQLServerPerf_000098.blg Ntirety-SQLServerPerf_000099.blg Ntirety-SQLServerPerf_000100.blg Ntirety-SQLServerPerf_000101.blg Ntirety-SQLServerPerf_000102.blg Ntirety-SQLServerPerf_000103.blg Ntirety-SQLServerPerf_000104.blg Ntirety-SQLServerPerf_000105.blg Ntirety-SQLServerPerf_000106.blg Ntirety-SQLServerPerf_000107.blg Ntirety-SQLServerPerf_000108.blg -f BIN -o Combined.blg

...and went to run it from the command prompt.  Much to my surprise I received the following:


At first the error ("The specified log file type has not been installed on this computer") threw me and made me investigate whether one of the files was corrupted in some way.  All of the files were roughly the same size, with the days that had multiple files (due to server restarts) still combining to be about the same size.  I collected smaller subsets of 2 files/5 files/10 files without issue.  At this point I turned to my Google-Fu and quickly found someone with a similar problem.  Microsoft employee Dustin Metzgar (blog/@DustinMetzgar) had a blog post from August 2013 "Performance Monitor Issues" that described my situation exactly. Issue #4 in the post is "Combining several BLGs into one is limited to 32 files and has poor error messages" - sound familiar?  Sure enough, if I modified my relog command to only include 32 files, it worked!

The item that Dustin doesn't address is how to handle the excess files.  I found that with a little creativity (very little) I was able to use multiple relog commands to combine groups of files into larger files and then combine those larger files into one final large file:

C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000066.blg Ntirety-SQLServerPerf_000067.blg Ntirety-SQLServerPerf_000068.blg Ntirety-SQLServerPerf_000069.blg Ntirety-SQLServerPerf_000070.blg  Ntirety-SQLServerPerf_000071.blg  Ntirety-SQLServerPerf_000072.blg  Ntirety-SQLServerPerf_000073.blg  Ntirety-SQLServerPerf_000074.blg  Ntirety-SQLServerPerf_000075.blg  Ntirety-SQLServerPerf_000076.blg  Ntirety-SQLServerPerf_000077.blg  Ntirety-SQLServerPerf_000078.blg  Ntirety-SQLServerPerf_000079.blg  Ntirety-SQLServerPerf_000080.blg  Ntirety-SQLServerPerf_000081.blg Ntirety-SQLServerPerf_000082.blg Ntirety-SQLServerPerf_000083.blg Ntirety-SQLServerPerf_000084.blg Ntirety-SQLServerPerf_000085.blg Ntirety-SQLServerPerf_000086.blg Ntirety-SQLServerPerf_000087.blg Ntirety-SQLServerPerf_000088.blg Ntirety-SQLServerPerf_000089.blg Ntirety-SQLServerPerf_000090.blg Ntirety-SQLServerPerf_000091.blg Ntirety-SQLServerPerf_000092.blg Ntirety-SQLServerPerf_000093.blg Ntirety-SQLServerPerf_000094.blg Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg  -f BIN -o Combined1.blg 



C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg Ntirety-SQLServerPerf_000098.blg Ntirety-SQLServerPerf_000099.blg Ntirety-SQLServerPerf_000100.blg Ntirety-SQLServerPerf_000101.blg Ntirety-SQLServerPerf_000102.blg Ntirety-SQLServerPerf_000103.blg Ntirety-SQLServerPerf_000104.blg Ntirety-SQLServerPerf_000105.blg Ntirety-SQLServerPerf_000106.blg Ntirety-SQLServerPerf_000107.blg Ntirety-SQLServerPerf_000108.blg -f BIN -o Combined2.blg 


C:\Perflogs\Blog>relog Combined1.blg Combined2.blg -f BIN -o Combined.blg


I was then able to sift through the combined.blg file for the information covering the entire period easily.  After manual examination, I next loaded the blg file into the Performance Analysis of Logs (PAL) Tool for further analysis, but that is the subject for another blog post. :)

One item I haven't been able to run down yet is why the large output file (combined.blg) only has 11795 samples while the two input files have 12659 samples total.  This would be expected behavior/by design if I had applied some type of filter in the relog command, but in this case I did not.  You can see in the previous screenshots (from the initial relogs) that the input samples match the output samples exactly in those cases.  If I find an answer I will update the post with the information.

Hope this helps!

Wednesday, May 21, 2014

Why Won't My Clustered Instance Start?

Last weekend I had a client page me in the middle of the night (because, of course, these things always happen in the middle of the night) with the complaint that their SQL Server Failover Cluster Instance (FCI) wouldn't come online on their B node after multiple attempts.  When they failed it back to the A node, the instance came up without issue.

The error they reported was this:
Error: 17750, Severity: 16, State: 0. Could not load the DLL SQLJDBC_XA.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.). 
If you don't recognize it outright, SQLJDBC_XA.dll is part of the JDBC (Java Database Connectivity) driver specifically for handling distributed transactions.

At first I went digging for references to this specific error, but didn't see much except for the fact that the Application Log on the server had roughly 2.3 Gajillion (a  technical term roughly equivalent to a Googolplex of Googolplexes) of these 17750 errors.

I tried to go to the SQL Server Error Log but of course the server wasn't changed from its default setting of six plus current, meaning there were only seven logs.  Since the client has tried multiple times to fail the instance back and forth, they had easily burned through the seven tries and therefore the original error.
PRO TIP: I strongly recommend changing the default log retention to a much larger number (30-45) and then setting up a job to cycle the error log every day by running sp_cycle_errorlog.  This works in all versions of SQL Server, makes the logs smaller and easier to open when you want to troubleshoot an issue, and often most importantly, protects you from the issue described above of multiple SQL Server service restarts aging out your log too quickly, or at least mitigates it (if you restart your instance 46 times, you will still be stuck).
Right-click on "SQL Server Logs" and select Configure.  Check the box for "Limit the number of error log files before they are recycled" and fill in the Maximum Number of Error Log files.  Many people are faked into believing that there is no limit because by default this box isn't checked, but if it isn't checked all you get is the *default* limit of six. #TheMoreYouKnow
I did the simple thing - I searched the B node for the file SQLJDBC_XA.dll to see if it was in the wrong directory or simply didn't exist - surprise surprise, it didn't exist.  The other possibility with this particular error (for a DLL file anyway) would be that it wasn't registered with Windows, but in this case it wasn't there at all.

I looked at the A node and searched for the file and there it was:


This showed me that the problem was probably related to a piece of application code installed on the A node, which I found without too much effort:


 I then looked on the B node and found the this ATG application wasn't there:



 No wonder the application DLL wasn't there!

As you can see the problem is that the application code was not installed on the B node and therefore the relevant database couldn't be started.  Failing the instance back to the A node allowed everything to run because the application code had been installed there.  The Windows Application Log also showed a particular database related to the situation (with a suspiciously similar name):
Date     5/17/2014 2:08:16 AM
Log     SQL Server
Source     spid7s
Message
Recovery is complete. This is an informational message only. No user action is required.
--
Date     5/17/2014 2:08:16 AM
Log     SQL Server
Source     spid51
Message

Starting up database 'atg'.
-- 
Date     5/17/2014 2:08:16 AM
Log     SQL Server
Source     spid63
Message
Attempting to load library 'SQLJDBC_XA.dll' into memory. This is an informational message only. No user action is required.
--
Date     5/17/2014 2:08:16 AM
Log     SQL Server
Source     spid63
Message
Error: 17750, Severity: 16, State: 0.
--
Date     5/17/2014 2:08:16 AM
Log     SQL Server
Source     spid63
Message
Could not load the DLL SQLJDBC_XA.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.)
 ...which led into a near infinite cycle of Attempting to load - Count not load - Attempting to load - Could not load - etc. etc. etc.  Only by making it all the way back to the instance restart could you see the "starting up database" message.

--

Lessons learned for the client (and all of us):

  • Don't install application code on a SQL Server unless you absolutely have to.  Ever.  When you do have to do so, make sure all impacted SQL Servers (cluster nodes, mirroring hosts, etc.) have the same version of the application with all of the same options enabled.  (This last part gets people as well - the application is installed on both nodes but one is installed with a particular flag enabled which adds extra DLLs to the server which causes the same problem seen above).
  • Set the SQL Server Log to extra history retention so that your DBAs and other impacted parties have a better chance of seeing the true cause of the initial failure - much of the information is in the Windows Application Log but not all of it, and it is easier to find the SQL Server-specific items in the SQL Server log.  Also, on many servers the Application Log cycles more frequently than every thirty days (although you can increase the size of the Application Log as well, and this also can be a good idea.)
  • Most importantly - call for help before trying the same thing over and over.  When I try something once and it doesn't work, I do often try it a second time to make sure I didn't make a simple fat-finger mistake the first time.  After that, throw up the warning flag and ask for help via another DBA, or #sqlhelp on Twitter, or Google-Fu - whatever helps you get a second opinion.
Hope this helps!