One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Thursday, May 29, 2014

Handling Perfmon Logs with Relog


In a recent situation, I was tasked by a client to check one of their servers which they believed to be grossly over-provisioned.  As a VM they knew they could easily pull both CPU and RAM from the server if they could determine the true needs of the server.

One of the items I regularly recommend to my clients is to have a "permanent" Perfmon collector log, where a basic set of counters is collected every five minutes, with the log file cycled every day.  This generates 1.5MB-2.0MB of files daily (a trivial amount of storage even if maintained for several years) and daily cycling makes the logs easier to handle when troubleshooting issues.  This is also useful for something most people don't have - a baseline over time - but to use it this way requires some serious finagling as you will see.

--

The way I set up the collector by default, the collector writes to a file in a designated directory, and I use a couple of batch files that use the logman command to stop and start the collector, and then a pair of Windows Scheduled Tasks, one to stop and start the collector each night (effectively cycling to a new output file) and one to start the collector on server start-up (so that a server reboot doesn't kill the collector altogether).  I know there are ways to configure a collector to cycle to new files each day, but they are version-specific based on your version of Windows, while the batch file/scheduled task method works across O/S versions.

--

The net effect of this method is that you end up with a directory with a large stack of blg files like this:


You can see a file for each day plus a new file any time the Server restarts.  This is great when you are looking for info of what happened on the night of 05/24/2014, but not so much when you need to know what happened over the last thirty days.

Enter relog!

Relog is a command line utility for manipulating perfmon output files (either .blg or .csv).  It is frequently used to filter existing files by counter or to convert .blg files into .csv files, both of which are described in this blog post and this one by John Pertell (blog/@jayape).  A lesser known use of relog is its ability to combine individual files into a single larger file, like this:

relog day1.blg day2.blg day3.blg -f BIN -o Day1_2_3.blg

...which would combine the three files day1, day2, and day3 into a single large file Day_1_2_3.

This is a necessary evil for many types of analysis since it can be difficult to wade through multiple perfmon log files at once.  Most tools claim to support it, but most of them also have vague disclaimers that amount to YMMV.  The most reliable way to handle this situation is to use relog to combine the files prior to beginning the analysis.


In the given situation, I had 43 files covering a little more than a month to analyze.  I hadn't collected this many files at once before, but didn't really think about it as any big deal.  I used notepad to mock together the relog command (not for the faint of heart):


C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000066.blg Ntirety-SQLServerPerf_000067.blg Ntirety-SQLServerPerf_000068.blg Ntirety-SQLServerPerf_000069.blg Ntirety-SQLServerPerf_000070.blg Ntirety-SQLServerPerf_000071.blg Ntirety-SQLServerPerf_000072.blg Ntirety-SQLServerPerf_000073.blg Ntirety-SQLServerPerf_000074.blg Ntirety-SQLServerPerf_000075.blg Ntirety-SQLServerPerf_000076.blg Ntirety-SQLServerPerf_000077.blg Ntirety-SQLServerPerf_000078.blg Ntirety-SQLServerPerf_000079.blg Ntirety-SQLServerPerf_000080.blg Ntirety-SQLServerPerf_000081.blg Ntirety-SQLServerPerf_000082.blg Ntirety-SQLServerPerf_000083.blg Ntirety-SQLServerPerf_000084.blg Ntirety-SQLServerPerf_000085.blg Ntirety-SQLServerPerf_000086.blg Ntirety-SQLServerPerf_000087.blg Ntirety-SQLServerPerf_000088.blg Ntirety-SQLServerPerf_000089.blg Ntirety-SQLServerPerf_000090.blg Ntirety-SQLServerPerf_000091.blg Ntirety-SQLServerPerf_000092.blg Ntirety-SQLServerPerf_000093.blg Ntirety-SQLServerPerf_000094.blg Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg Ntirety-SQLServerPerf_000098.blg Ntirety-SQLServerPerf_000099.blg Ntirety-SQLServerPerf_000100.blg Ntirety-SQLServerPerf_000101.blg Ntirety-SQLServerPerf_000102.blg Ntirety-SQLServerPerf_000103.blg Ntirety-SQLServerPerf_000104.blg Ntirety-SQLServerPerf_000105.blg Ntirety-SQLServerPerf_000106.blg Ntirety-SQLServerPerf_000107.blg Ntirety-SQLServerPerf_000108.blg -f BIN -o Combined.blg

...and went to run it from the command prompt.  Much to my surprise I received the following:


At first the error ("The specified log file type has not been installed on this computer") threw me and made me investigate whether one of the files was corrupted in some way.  All of the files were roughly the same size, with the days that had multiple files (due to server restarts) still combining to be about the same size.  I collected smaller subsets of 2 files/5 files/10 files without issue.  At this point I turned to my Google-Fu and quickly found someone with a similar problem.  Microsoft employee Dustin Metzgar (blog/@DustinMetzgar) had a blog post from August 2013 "Performance Monitor Issues" that described my situation exactly. Issue #4 in the post is "Combining several BLGs into one is limited to 32 files and has poor error messages" - sound familiar?  Sure enough, if I modified my relog command to only include 32 files, it worked!

The item that Dustin doesn't address is how to handle the excess files.  I found that with a little creativity (very little) I was able to use multiple relog commands to combine groups of files into larger files and then combine those larger files into one final large file:

C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000066.blg Ntirety-SQLServerPerf_000067.blg Ntirety-SQLServerPerf_000068.blg Ntirety-SQLServerPerf_000069.blg Ntirety-SQLServerPerf_000070.blg  Ntirety-SQLServerPerf_000071.blg  Ntirety-SQLServerPerf_000072.blg  Ntirety-SQLServerPerf_000073.blg  Ntirety-SQLServerPerf_000074.blg  Ntirety-SQLServerPerf_000075.blg  Ntirety-SQLServerPerf_000076.blg  Ntirety-SQLServerPerf_000077.blg  Ntirety-SQLServerPerf_000078.blg  Ntirety-SQLServerPerf_000079.blg  Ntirety-SQLServerPerf_000080.blg  Ntirety-SQLServerPerf_000081.blg Ntirety-SQLServerPerf_000082.blg Ntirety-SQLServerPerf_000083.blg Ntirety-SQLServerPerf_000084.blg Ntirety-SQLServerPerf_000085.blg Ntirety-SQLServerPerf_000086.blg Ntirety-SQLServerPerf_000087.blg Ntirety-SQLServerPerf_000088.blg Ntirety-SQLServerPerf_000089.blg Ntirety-SQLServerPerf_000090.blg Ntirety-SQLServerPerf_000091.blg Ntirety-SQLServerPerf_000092.blg Ntirety-SQLServerPerf_000093.blg Ntirety-SQLServerPerf_000094.blg Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg  -f BIN -o Combined1.blg 



C:\PerfLogs\Blog>relog Ntirety-SQLServerPerf_000095.blg Ntirety-SQLServerPerf_000096.blg Ntirety-SQLServerPerf_000097.blg Ntirety-SQLServerPerf_000098.blg Ntirety-SQLServerPerf_000099.blg Ntirety-SQLServerPerf_000100.blg Ntirety-SQLServerPerf_000101.blg Ntirety-SQLServerPerf_000102.blg Ntirety-SQLServerPerf_000103.blg Ntirety-SQLServerPerf_000104.blg Ntirety-SQLServerPerf_000105.blg Ntirety-SQLServerPerf_000106.blg Ntirety-SQLServerPerf_000107.blg Ntirety-SQLServerPerf_000108.blg -f BIN -o Combined2.blg 


C:\Perflogs\Blog>relog Combined1.blg Combined2.blg -f BIN -o Combined.blg


I was then able to sift through the combined.blg file for the information covering the entire period easily.  After manual examination, I next loaded the blg file into the Performance Analysis of Logs (PAL) Tool for further analysis, but that is the subject for another blog post. :)

One item I haven't been able to run down yet is why the large output file (combined.blg) only has 11795 samples while the two input files have 12659 samples total.  This would be expected behavior/by design if I had applied some type of filter in the relog command, but in this case I did not.  You can see in the previous screenshots (from the initial relogs) that the input samples match the output samples exactly in those cases.  If I find an answer I will update the post with the information.

Hope this helps!

2 comments:

  1. This is a great strategy for dealing with hundreds of servers and not having to pay for licensing a data capture tool. I'll be sharing it with my co-workers. Thanks

    ReplyDelete
  2. Hi Andy,

    Came across the same constraint just yesterday while trying to process a large number of BLG's for monthly baselines. One item I would like to add, in case you didn't notice is that you may simple use *.blg as an input filter for RELOG, so as long as you restrict the number of blg files to 32 in the folder it saves you having to provide the name for each individual file.

    In my case I used PowerShell to move 31 files at a time into a "processing" folder, where also the all_output.blg of the previous RELOG resided, each time renaming the output.blg of the current RELOG run overriding the previous all_output.blg. In this typical manner once can almost automate the processing and accumulation of baseline counters weekly/monthly - In my case I had to process to blg files per day.

    ReplyDelete