Sunday, 16 August 2015

Interlude #2: UCD Monitoring with Windows Performance Monitor and JMXTrans

I vaguely mentioned in Part 1 of this blog series on monitoring that there is a collectd alternative for windows. Indeed, there is,  it's called SSC Serv and can be found here. It works with the collectd protocol and has the same collection module architecture so it is a defacto port of the unix collectd for windows. Here's the catch, it's not free. It's not that expensive so it may be worth the money for enterprise customers but there is another catch: Windows has it's own Performance Monitoring suite that Windows administrators are usually already comfortable with; the Windows Performance Monitor built into every Windows platform.

Windows Performance Monitor doesn't do Java performance metrics, whereas collectd does, so we'll need to augment it with another solution for collecting those, we are going to use JMXTrans for that. JMXTrans is a JMX monitoring solution that converts JMX the the graphite metrics protocol. Window Performance Monitor does not "speak" the graphite metrics protocol so, we also need something to translate that, we will use custom PowerShell functions to do that and run them as a windows service. Will use a powerful windows service manager called the Non-Sucking Service Manager (nssm) for that.

Ok, to recap, this is our solution:
And we can't forget:
When it's all said and done it will look something like this:
Fig 1: Windows Performance Monitor and JMXTrans
Fig 1: Windows Performance Monitor and JMXTrans

Import the Generic process in the same way as described for the collectd generic process in Part 1. The process paramaters are also similar, though as our process only supports agents and there is no client/server architecture, there is a much smaller subset.
  • Component Name: Should be set to the name of the component we imported earlier
  • Version Name (Optional): You can specify the name of a specific version of the component to use, otherwise it will use the latest
  • Graphite Server: The Graphite server we are connecting to.
The process used is also similar to Part 1:
Fig 2: Windows Performance Monitor and JMXTrans Deployment
Fig 2: Windows Performance Monitor and JMXTrans Deployment

The process involves fetching a cookbook from GIT, installing Chef on the Windows node, installing the Graphite PowerShell cookbook, downloading the JMXTrans jar and installing them both as windows services:

  1. Set Defaults: This sets up some in process defaults, well, just one for now the nullProperty. This is a property that is used in conditional forks to present an empty string. In other words, an unset property.
  2. component: This get the extended information of the component, in particular, the component id is required for steps further on down. This fetches it given an component name (passed in).
  3. IsVersionNameSet: This checks if the user passed in a specific version to deploy. If so it will skip to 6
  4. LatestVersion: Get the latest version from the component. This is a custom step from the ComponentPlus plugin I created. It takes a component name and returns a version name.
  5. versionName: This sets the request level version name (the one usually passed in) from step 4. Step 6 expects the request level parameter.
  6. version: Given the component name and version name get the version id. This is also a custom step I created using the ComponentPlus plugin.
  7. DownloadArtifacts: Download the artifacts from the component and version passed in. This fetches the Chef cookbook.
  8. Install Chef: A simple PowerShell script that downloads and installs the latest chef:

    powershell.exe -NoLogo -NonInteractive -command "(New-Object System.Net.WebClient).DownloadFile(\"https://opscode-omnibus-packages.s3.amazonaws.com/windows/2008r2/x86_64/chef-client-12.4.1-1.msi\",\"chef.msi\")"
    msiexec /log msiexec.log /qn /i chef.msi ADDLOCAL="ChefClientFeature"
    type msiexec.log
    
    
  9. hostname: Retrieves the fully qualified domain name using Groovy (required for JMXTrans configuration)
  10. Create Graphite Powershell Node: This works similar to the collectd install described in Part 3 except that the configuration files generated in this case are used by the PowerShell functions.

    {
       "run_list": [ "recipe[graphite_powershell_functions::default]" ],
       "graphite_powershell_functions": {
         "CarbonServer" : "${p:collectd_server}",
         "CarbonServerPort" : 2003,
         "MetricPath" : "perfmon.",
         "MetricSendIntervalSeconds" : 5,
         "TimeZoneOfGraphiteServer" : "UTC",
         "hostname" : "${p:hostname/hostname}",
         "PerformanceCounters" : [
           "Network Interface(*)\\Bytes Received/sec",
           "Network Interface(*)\\Bytes Sent/sec",
           "Network Interface(*)\\Packets Received Unicast/sec",
           "Network Interface(*)\\Packets Sent Unicast/sec",
           "Network Interface(*)\\Packets Received Non-Unicast/sec",
           "Network Interface(*)\\Packets Sent Non-Unicast/sec",
           "Processor(_Total)\\% Processor Time",
           "Processor(_Total)\\% User Time",
           "Processor(_Total)\\% Idle Time",
           "Memory\\Available MBytes",
           "Memory\\Pages/sec",
           "Memory\\Pages Input/sec",
           "System\\Processor Queue Length",
           "System\\Threads",
           "System\\File Write Operations/sec",
           "System\\File Read Operations/sec",
           "PhysicalDisk(*)\\Avg. Disk Write Queue Length",
           "PhysicalDisk(*)\\Avg. Disk Read Queue Length",
           "TCPv4\\Segments Received/sec",
           "TCPv4\\Segments Sent/sec",
           "TCPv4\\Segments Retransmitted/sec"
         ],
         "MetricFilter" : [
           "isatap",
           "teredo tunneling"
         ],
         "nssm_archive" : "http://nssm.cc/release/nssm-2.24.zip",
         "nssm_archive_checksum" : "727d1e42275c605e0f04aba98095c38a8e1e46def453cdffce42869428aa6743"
       }
     }
    
    
    The thing of note in this configuration is the "PerformanceCounters" array. These are all the Windows Performance Counters that will be transferred to Graphite. To get a list of these, go to a windows command prompt and type:
    typeperf -qx
    Any of those can be included on that list, note the double backslash used in the configuration. More information can be found at:https://github.com/MattHodge/Graphite-PowerShell-Functions 
  11. Install Chef Node: Install the chef node configuration for Graphite PowerShell Functions 
  12. Stop JMXTrans Server: Stop any prior installation of the JMXTrans service so that the JMXTrans jar can be downloaded and copied over 
  13. Download JMXTrans: Download the latest jmxtrans jar file
    cd C:\GraphitePowershellFunctions
    powershell.exe -NoLogo -NonInteractive -command "(New-Object System.Net.WebClient).DownloadFile(\"http://central.maven.org/maven2/org/jmxtrans/jmxtrans/251/jmxtrans-251-all.jar\",\"jmxtrans-all.jar\")" 
  14. Fix JMXTrans Logging: This is a groovy step that adds a logback.xml file to the downloaded jar. This is needed, as without it, the JMXTrans logging is stuck on debug. (ie. very verbose)
  15. Get Java Home: Gets the Java home of the agent used to launch JMXTrans. 
  16. Create JMXTrans Config:
  17. {
       "servers" : [ {
         "port" : "9010",
         "host" : "localhost",
         "alias" : "${p:hostname/hostname}",
         "queries" : [ 
         {
           "obj" : "java.lang:type=ClassLoading",
           "attr" : [ "LoadedClassCount" ],
           "resultAlias": "JMXTrans-agent-",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=Compilation",
           "attr" : [ "TotalCompilationTime" ],
           "resultAlias": "JMXTrans-agent-",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=GarbageCollector,name=*",
           "attr" : [ "CollectionCount", "CollectionTime" ],
           "resultAlias": "JMXTrans-agent-gc",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=Memory",
           "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
           "resultAlias": "JMXTrans-agent-memory",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         }, 
         {
           "obj" : "java.lang:type=MemoryPool,name=*",
           "attr" : [ "Usage"],
           "resultAlias": "JMXTrans-agent-memory_pool",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         } ]
       } ]
     }
    
    This can be left as-is for monitoring, it is similar to the collectd JMX configuration. It is possible to add sections for Tomcat. Example can be found here: https://code.google.com/p/jmxtrans/wiki/MoreExamples

    If new queries are added, make sure to add to copy the outputWriters attribute over also use a resultAlias that starts with JMXTrans-agent- (or JMXTrans-server-) to keep metrics consistent.

  18. Get Agent Home: Fine the Agents home directory. Update Agent Worker JMX Settings: Update the UrbanCode Deploy agent worker jvm so that remote JMX is enabled.

    @ECHO OFF
    SETLOCAL ENABLEDELAYEDEXPANSION
    
    set inputFile=${p:AGENT_HOME}\bin\worker-args.conf
    set outputFile=${p:AGENT_HOME}\bin\worker-args.jmx
    set _strFind=java.security.properties
    set _strFound=com.sun.management.jmxremote
    set i=0
    
    IF EXIST %outputFile% del /F %outputFile%
     
    >nul findstr /c:"%_strFound%" "%inputFile%" && (
      echo "File already enabled";
    ) || (
    FOR /F "usebackq tokens=1 delims=[]" %%A IN (`FIND /N "%_strFind%" "%inputFile%"`) DO (set _strNum=%%A)
    FOR /F "usebackq delims=" %%A IN ("%inputFile%") DO (
      set /a i = !i! + 1
      ECHO %%A>>"%outputFile%"
      IF [!i!] == [!_strNum!] (
        ECHO -Dcom.sun.management.jmxremote>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.port=9010>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.local.only=true>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.authenticate=false>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.ssl=false>>"%outputFile%"
      )
    )
    MOVE /Y "${p:AGENT_HOME}\bin\worker-args.conf" "${p:AGENT_HOME}\bin\worker-args.bak"
    MOVE /Y "${p:AGENT_HOME}\bin\worker-args.jmx" "${p:AGENT_HOME}\bin\worker-args.conf"
    )
    
    
  19. Install JMXTrans as a Service. This uses the ussm service manager to install JMXTrans as a service:

    C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe remove JMXTrans confirm C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe install JMXTrans "${p:JAVA_HOME}\bin\java.exe" "-Djmxtrans.log.level=ERROR -Djmxtrans.log.dir=C:/GraphitePowershellFunctions -jar C:\GraphitePowershellFunctions\jmxtrans-all.jar -e -f C:\GraphitePowershellFunctions\jmxtrans.json -s 5" C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppDirectory C:\GraphitePowershellFunctions C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppStdout C:\GraphitePowershellFunctions\jmxtransout.log C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppStderr C:\GraphitePowershellFunctions\jmxtranserr.log C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppExit Default Restart C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe start JMXTrans

After this is completed, restart the agent, and you should see a perform section with both the Windows Performance Monitor metrics and JMXTrans: