Sunday, 30 August 2015

Monitoring OpenStack Deployments with Docker, Graphite, Grafana, collectd and Chef using UrbanCode Deploy!

I was considering making this a part of the "Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef!" series but as I didn't include this in the original architecture so it would be more to consider this an Addendum. In reality It's probably more of a fork as I'll may continue with future blog postings about the architecture herein.

One of the issues I ran into right away while deploying the monitoring solution described in "Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef!" into an internal topology managed by UrbanCode Deploy was that each of the agent host machines had quirks and issues that required me to constantly tweak the monitoring install process. Fixing yum and apt-get repositories, removing conflicts, installing unexpectedly missing libraries, conflicting JDKs.
The reason for this is that each machine was installed by different people who installed the operating systems and the UrbanCode Deploy Agent in different ways with different options. It would of been great if all nodes were consistent, it would of made my life much easier.

It was at this point that my colleague Michael told me that I should create a Blueprint in UrbanCode Deploy for the topology I want to deploy the monitoring solution into for testing. Here's Michael doing a quick demo of UrbanCode Deploy Blueprint Designer (aka UrbanCode Deploy with Patterns):


Fantastic, I can create a blueprint of the desired topology, add a monitoring component to the nodes that I wish to have monitored and presto!  Here is what the blueprint looks like in UrbanCode Deploy Blueprint Designer:

I created three nodes with three different operating systems just to show off  that this solution works on different Operating Systems. (it also works on RHEL 7 but I thought adding another node would be overdoing it a little as well as cramming my already overcrowded RSA sketches).

This blueprint is actually a Heat Orchestration Template (HOT). You can see the source code here: https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Monitoring/Monitoring.yaml

So if we modify the original Installation in Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef! Part 1, it would look something like this:
(Click to Enlarge)

We don't have any UrbanCode Deploy agents installed as the agent install is incorporated as part of the blueprint. You can see this in the yaml under the resources identified by ucd_agent_install_linux and ucd_agent_install_win. You'll see some bash or powershell scripting that installs the UrbanCode Agent as part of the virtual machine initialization.

You'll also see the IBM::UrbanCode::SoftwareDeploy::UCD, IBM::UrbanCode::SoftwareConfig::UCD  and IBM::UrbanCode::ResourceTree resource types which allow the Heat engine to deploy create resources in UrbanCode Deploy and ensure that component processes are executed are installed into the virtual machines, once the UrbanCode Deploy agents are installed and started.

Ok, let's take a time out and talk a little about how this all works. First what's Heat?
Heat is an Orchestration engine that is able to call cloud provider API's, and other necessary APIs to actualize the resources that are specified in yaml into a cloud environment. Heat is part of the OpenStack project so it "natively" supports OpenStack Clouds but can also work with Amazon Web Services, IBM SoftLayer and any other cloud provider that is compliant with the OpenStack interfaces required to create virtual machines, the virtual networks, etc.

In addition, Heat can be extended with other resource types like those for UrbanCode Deploy Components that allows them to be deployed into Environments provisioned by OpenStack via Heat using the Heat Orchestration Template (HOT) specified during a provisioning.

The UrbanCode Deploy Blueprint Designer provides a kick ass visual editor and a simple way to drag drop UrbanCode Deploy Components into Heat Orchestration Templates (HOT). It also provides the ability to connect to a cloud provider (OpenStack, AWS and IBM SoftLayer are currently supported) and deploy the HOT.
You can monitor the deployment progress.
Oh, it also uses Git as a source for the HOTs (yaml) so that makes it super easy to version and share blueprints.

Ok, let's go over the steps on how to install it. I assume you have UrbanCode Deploy installed and configured with UrbanCode Deploy Blueprint Designer and connected to an OpenStack Cloud. You can set up a quick cloud using DevStack.
  1. You'll also need to install the chef plugin from here: https://developer.ibm.com/urbancode/plugin/chef
  2. Import the appliactionfrom IBM BlueMix DevOps Service Git found here:
    https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Monitored_app.json
    Import it from the Applications tab:

    Use the default options in the import dialog. After, you should now see it listed in Applications as "Monitored". There will also be a new component in the Components tab called Monitoring


    I have made the git repository public so the component is already configured to to to the  IBM BlueMix DevOps Service Git and pull the recipe periodically and create a new version, you may change this behaviour in Basic Settings by unchecking the Import Versions Automatically setting.

  3. You'll have to fix up the imported process a little as I had to remove the encrypted fields to allow easier import.
    Go to Components->Monitoring->Processes-Install and edit the Install Collectd Step


    In the Collectd Password field put. You will see bullets, that's ok. Copy/Paste (and No Spaces!):

    ${p:environment/monitoring.password}
  4. We need a metrics collector to store the metrics and a graphing engine to visualize them. We'll be using a Docker image of Graphite/Grafana/Collectd I put together. You will need to ability to build run a docker container either using boot2docker or the native support available in Linux
    I have put the image up on the public docker registry as bkuschel/graphite-grafana-collectd but you can also build it from the Dockerfile in IBM BlueMix DebOps Services's Git at https://hub.jazz.net/git/kuschel/monitorucd/contents/master/DockerWithCollectd/Dockerfile
  5. To get the image run:

    docker pull bkuschel/graphite-grafana-collect

    Now run the image and bind the ports 80, 2003 and udp port 2 from the docker container to the hosts ports.

    docker run -p 80:80 -p 2003:2003 -p 25826:25826/udp -t bkuschel/graphite-grafana-collectd

    You can also mount file volumes to the container that contains the collector's database, if you wish that to be persisted. Each time you restart the container, it contains a fresh database. This has its advantages for testing. You can also specify other configurations beyond what are provided as defaults. Look at the Dockerfile for the volumes.
  6. You'll need to connect the UrbanCode Blueprint Designer to Git by adding  https://hub.jazz.net/git/kuschel/monitorucd to the repositories

  7. You should now see Monitoring in the list of Blue prints on the UrbanCode Deploy Blueprint Designer Home Page. Click on it to open the blueprint.
I am not going to cover the UrbanCode Component Processes as they are essentially the same the ones I described in Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef! (Part 2: The UCD Process) and Interlude #2: UCD Monitoring with Windows Performance Monitor and JMXTrans. The processes have been reworked to be executable using the application/component processes rather then solely from Resource Processes (generic). I also added some steps that do fix of typical problems in OpenStack images, such as fixing the repository and a workaround for a host name issue causing JMX not to bind properly.

The blueprint is also rudimentary and it may need to be tweaked to conform to the specific cloud set up in your environment. I created three virtual machines for Operating System images I happed to have available on my OpenStack, hooked them together on the private network and gave them external IPs  so that I can access them. They all have the Monitoring Component added to them and should be deployed into the Monitored Application.

Once you've fixed everything up, make sure you select a cloud and then click Provision...:
(Click to Enlarge)

It will now ask for launch configuration parameters, again, many of these will be specific to you environment but you should be able to leave everything as is.

If you bound the Docker container to other different ports you'll have to change the port numbers for graphite (2003) and Docker (25826).
You will need to set the Admin Pass to something recognizable, it's the Windows Administrator Password. You may or may not need this depending on how your windows image is set up. I needed it.
The Monitoring / Server is the Public IP address of your Docker host running the bkuschel/graphite-grafana-collectd image. The Monitoring / Password is the one the is built into the Docker image. You will need to modify the Docker image to either not hard code this value or build a new image with a different password.

Once Provision is clicked, something like this should happen:
(Click to Enlarge)

  1. The Monitoring.yaml (originating from Git) in UrbanCode Deploy Blueprint is passed to the heat engine on provisioning, with all parameters bound.
  2. The heat engine creates an UrbanCode Deploy Environment in the application specified in yaml (this can be changed)
  3. The UrbanCode Deploy Environment is mapped to the UrbanCode Deploy Component as specified in the yaml resource
  4. It also creates UrbanCode Deploy resources that will be used to represent the UrbanCode Deploy agents once they come online
  5. The agent resources are mapped to the environment.
  6. Heat interacts with the cloud provider (OpenStack in this case) to deploy the virtual machines specified in the yaml.
  7. The virtual machines are created and the agents installed as part of virtual machine intialization ("user data").
  8. Once the agents come online the component process is run
  9. The component process will be run for each resource mapped to the environment
  10. The component process runs the generic process Install_collectd_process (or Install_perfmon_process for Windows) on each agent.
  11. The agent installs collectd or GraphitePowershellFunctions via chef and performs other process steps as required to get the monitoring solution deployed.
The progress can be monitored in UrbanCode Deploy Blueprint Designer:

(Click to Enlarge)

Once the process is finished the new topology should look something like this:

(Click to Enlarge)

That should be it, give it a shot. Once you got it working the results are quite impressive.  Here are some Grafana performance dashboards for CPU and Heap based on the environment I deployed using this method. The three Monitoring_Monitoring_<ip> correspond to <UrbanCode Deploy Application>_<UrbanCode Deploy Component>_<UrbanCode Agent Host Name>: 

(Click to Enlarge)

Hers are some related topics that I am considering for future posts:
  • More on HOT with UrbanCode Blueprint Designer
  • More on OpenStack
  • More Grafana Dashboards

Sunday, 16 August 2015

Interlude #2: UCD Monitoring with Windows Performance Monitor and JMXTrans

I vaguely mentioned in Part 1 of this blog series on monitoring that there is a collectd alternative for windows. Indeed, there is,  it's called SSC Serv and can be found here. It works with the collectd protocol and has the same collection module architecture so it is a defacto port of the unix collectd for windows. Here's the catch, it's not free. It's not that expensive so it may be worth the money for enterprise customers but there is another catch: Windows has it's own Performance Monitoring suite that Windows administrators are usually already comfortable with; the Windows Performance Monitor built into every Windows platform.

Windows Performance Monitor doesn't do Java performance metrics, whereas collectd does, so we'll need to augment it with another solution for collecting those, we are going to use JMXTrans for that. JMXTrans is a JMX monitoring solution that converts JMX the the graphite metrics protocol. Window Performance Monitor does not "speak" the graphite metrics protocol so, we also need something to translate that, we will use custom PowerShell functions to do that and run them as a windows service. Will use a powerful windows service manager called the Non-Sucking Service Manager (nssm) for that.

Ok, to recap, this is our solution:
And we can't forget:
When it's all said and done it will look something like this:
Fig 1: Windows Performance Monitor and JMXTrans
Fig 1: Windows Performance Monitor and JMXTrans

Import the Generic process in the same way as described for the collectd generic process in Part 1. The process paramaters are also similar, though as our process only supports agents and there is no client/server architecture, there is a much smaller subset.
  • Component Name: Should be set to the name of the component we imported earlier
  • Version Name (Optional): You can specify the name of a specific version of the component to use, otherwise it will use the latest
  • Graphite Server: The Graphite server we are connecting to.
The process used is also similar to Part 1:
Fig 2: Windows Performance Monitor and JMXTrans Deployment
Fig 2: Windows Performance Monitor and JMXTrans Deployment

The process involves fetching a cookbook from GIT, installing Chef on the Windows node, installing the Graphite PowerShell cookbook, downloading the JMXTrans jar and installing them both as windows services:

  1. Set Defaults: This sets up some in process defaults, well, just one for now the nullProperty. This is a property that is used in conditional forks to present an empty string. In other words, an unset property.
  2. component: This get the extended information of the component, in particular, the component id is required for steps further on down. This fetches it given an component name (passed in).
  3. IsVersionNameSet: This checks if the user passed in a specific version to deploy. If so it will skip to 6
  4. LatestVersion: Get the latest version from the component. This is a custom step from the ComponentPlus plugin I created. It takes a component name and returns a version name.
  5. versionName: This sets the request level version name (the one usually passed in) from step 4. Step 6 expects the request level parameter.
  6. version: Given the component name and version name get the version id. This is also a custom step I created using the ComponentPlus plugin.
  7. DownloadArtifacts: Download the artifacts from the component and version passed in. This fetches the Chef cookbook.
  8. Install Chef: A simple PowerShell script that downloads and installs the latest chef:

    powershell.exe -NoLogo -NonInteractive -command "(New-Object System.Net.WebClient).DownloadFile(\"https://opscode-omnibus-packages.s3.amazonaws.com/windows/2008r2/x86_64/chef-client-12.4.1-1.msi\",\"chef.msi\")"
    msiexec /log msiexec.log /qn /i chef.msi ADDLOCAL="ChefClientFeature"
    type msiexec.log
    
    
  9. hostname: Retrieves the fully qualified domain name using Groovy (required for JMXTrans configuration)
  10. Create Graphite Powershell Node: This works similar to the collectd install described in Part 3 except that the configuration files generated in this case are used by the PowerShell functions.

    {
       "run_list": [ "recipe[graphite_powershell_functions::default]" ],
       "graphite_powershell_functions": {
         "CarbonServer" : "${p:collectd_server}",
         "CarbonServerPort" : 2003,
         "MetricPath" : "perfmon.",
         "MetricSendIntervalSeconds" : 5,
         "TimeZoneOfGraphiteServer" : "UTC",
         "hostname" : "${p:hostname/hostname}",
         "PerformanceCounters" : [
           "Network Interface(*)\\Bytes Received/sec",
           "Network Interface(*)\\Bytes Sent/sec",
           "Network Interface(*)\\Packets Received Unicast/sec",
           "Network Interface(*)\\Packets Sent Unicast/sec",
           "Network Interface(*)\\Packets Received Non-Unicast/sec",
           "Network Interface(*)\\Packets Sent Non-Unicast/sec",
           "Processor(_Total)\\% Processor Time",
           "Processor(_Total)\\% User Time",
           "Processor(_Total)\\% Idle Time",
           "Memory\\Available MBytes",
           "Memory\\Pages/sec",
           "Memory\\Pages Input/sec",
           "System\\Processor Queue Length",
           "System\\Threads",
           "System\\File Write Operations/sec",
           "System\\File Read Operations/sec",
           "PhysicalDisk(*)\\Avg. Disk Write Queue Length",
           "PhysicalDisk(*)\\Avg. Disk Read Queue Length",
           "TCPv4\\Segments Received/sec",
           "TCPv4\\Segments Sent/sec",
           "TCPv4\\Segments Retransmitted/sec"
         ],
         "MetricFilter" : [
           "isatap",
           "teredo tunneling"
         ],
         "nssm_archive" : "http://nssm.cc/release/nssm-2.24.zip",
         "nssm_archive_checksum" : "727d1e42275c605e0f04aba98095c38a8e1e46def453cdffce42869428aa6743"
       }
     }
    
    
    The thing of note in this configuration is the "PerformanceCounters" array. These are all the Windows Performance Counters that will be transferred to Graphite. To get a list of these, go to a windows command prompt and type:
    typeperf -qx
    Any of those can be included on that list, note the double backslash used in the configuration. More information can be found at:https://github.com/MattHodge/Graphite-PowerShell-Functions 
  11. Install Chef Node: Install the chef node configuration for Graphite PowerShell Functions 
  12. Stop JMXTrans Server: Stop any prior installation of the JMXTrans service so that the JMXTrans jar can be downloaded and copied over 
  13. Download JMXTrans: Download the latest jmxtrans jar file
    cd C:\GraphitePowershellFunctions
    powershell.exe -NoLogo -NonInteractive -command "(New-Object System.Net.WebClient).DownloadFile(\"http://central.maven.org/maven2/org/jmxtrans/jmxtrans/251/jmxtrans-251-all.jar\",\"jmxtrans-all.jar\")" 
  14. Fix JMXTrans Logging: This is a groovy step that adds a logback.xml file to the downloaded jar. This is needed, as without it, the JMXTrans logging is stuck on debug. (ie. very verbose)
  15. Get Java Home: Gets the Java home of the agent used to launch JMXTrans. 
  16. Create JMXTrans Config:
  17. {
       "servers" : [ {
         "port" : "9010",
         "host" : "localhost",
         "alias" : "${p:hostname/hostname}",
         "queries" : [ 
         {
           "obj" : "java.lang:type=ClassLoading",
           "attr" : [ "LoadedClassCount" ],
           "resultAlias": "JMXTrans-agent-",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=Compilation",
           "attr" : [ "TotalCompilationTime" ],
           "resultAlias": "JMXTrans-agent-",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=GarbageCollector,name=*",
           "attr" : [ "CollectionCount", "CollectionTime" ],
           "resultAlias": "JMXTrans-agent-gc",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         },
         {
           "obj" : "java.lang:type=Memory",
           "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
           "resultAlias": "JMXTrans-agent-memory",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         }, 
         {
           "obj" : "java.lang:type=MemoryPool,name=*",
           "attr" : [ "Usage"],
           "resultAlias": "JMXTrans-agent-memory_pool",
           "outputWriters" : [ {
             "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
             "settings" : {
               "port" : 2003,
               "host" : "${p:collectd_server}",
               "rootPrefix": "perfmon"
             }
           } ]
         } ]
       } ]
     }
    
    This can be left as-is for monitoring, it is similar to the collectd JMX configuration. It is possible to add sections for Tomcat. Example can be found here: https://code.google.com/p/jmxtrans/wiki/MoreExamples

    If new queries are added, make sure to add to copy the outputWriters attribute over also use a resultAlias that starts with JMXTrans-agent- (or JMXTrans-server-) to keep metrics consistent.

  18. Get Agent Home: Fine the Agents home directory. Update Agent Worker JMX Settings: Update the UrbanCode Deploy agent worker jvm so that remote JMX is enabled.

    @ECHO OFF
    SETLOCAL ENABLEDELAYEDEXPANSION
    
    set inputFile=${p:AGENT_HOME}\bin\worker-args.conf
    set outputFile=${p:AGENT_HOME}\bin\worker-args.jmx
    set _strFind=java.security.properties
    set _strFound=com.sun.management.jmxremote
    set i=0
    
    IF EXIST %outputFile% del /F %outputFile%
     
    >nul findstr /c:"%_strFound%" "%inputFile%" && (
      echo "File already enabled";
    ) || (
    FOR /F "usebackq tokens=1 delims=[]" %%A IN (`FIND /N "%_strFind%" "%inputFile%"`) DO (set _strNum=%%A)
    FOR /F "usebackq delims=" %%A IN ("%inputFile%") DO (
      set /a i = !i! + 1
      ECHO %%A>>"%outputFile%"
      IF [!i!] == [!_strNum!] (
        ECHO -Dcom.sun.management.jmxremote>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.port=9010>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.local.only=true>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.authenticate=false>>"%outputFile%"
        ECHO -Dcom.sun.management.jmxremote.ssl=false>>"%outputFile%"
      )
    )
    MOVE /Y "${p:AGENT_HOME}\bin\worker-args.conf" "${p:AGENT_HOME}\bin\worker-args.bak"
    MOVE /Y "${p:AGENT_HOME}\bin\worker-args.jmx" "${p:AGENT_HOME}\bin\worker-args.conf"
    )
    
    
  19. Install JMXTrans as a Service. This uses the ussm service manager to install JMXTrans as a service:

    C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe remove JMXTrans confirm C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe install JMXTrans "${p:JAVA_HOME}\bin\java.exe" "-Djmxtrans.log.level=ERROR -Djmxtrans.log.dir=C:/GraphitePowershellFunctions -jar C:\GraphitePowershellFunctions\jmxtrans-all.jar -e -f C:\GraphitePowershellFunctions\jmxtrans.json -s 5" C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppDirectory C:\GraphitePowershellFunctions C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppStdout C:\GraphitePowershellFunctions\jmxtransout.log C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppStderr C:\GraphitePowershellFunctions\jmxtranserr.log C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe set JMXTrans AppExit Default Restart C:\GraphitePowershellFunctions\nssm\current\win64\nssm.exe start JMXTrans

After this is completed, restart the agent, and you should see a perform section with both the Windows Performance Monitor metrics and JMXTrans:

Wednesday, 12 August 2015

Interlude: Monitoring UrbanCode Deployments with nmon

In Part 1, Part 2 and Part 3 of my blog series called "Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef!" I have been using collectd as the engine for collecting metrics on nodes and feeding them to Graphite. There is another option to collectd metrics and feed it into graphite: nmon (short for Nigel's performance Monitor; one could say we're Making Plans for Nigel's performance monitor)

There is really no great reason to use nmon over collectd in most cases, collectd can collect metrics from far more sources and has a secure client/server architecture for data transmission. So why use nmon at all? Even though it's supported on Linux, there isn't much reason to use it there but it's greatest benefit is for AIX, especially if LPARs are being used. LPARs are a type of virtualization where multiple AIX OS instances can be run on one physical machine. nmon provides per LPAR metrics for things like memory and CPU consumption. This can be useful.

In fact, you can run collectd AND nmon together on the same node and get the best of both worlds. What would an nmon topology with UrbanCode look like?


I haven't created any UrbanCode processes for deploying this nmon solution but it's definitely feasible if not easier as nmon is usually packaged as part of AIX and readily available for Linux. There is also a Chef recipe for it.

How do we get nmon to feed graphite? There is a script called nmon2graphite that runs as daemon and connects via  a pipe to an nmon daemon. and converts the nmon language to the graphite language.

You can download this script for AIX here:
https://github.com/chmod666org/nmon2graphite/raw/master/nmon2graphite

Note that this is part of a broader project hosted here on the nmon2graphite home page which also augments the graphite engine to add a custom page. This page is optional. I have already augmented the bkuschel/graphite-grafana with the suggested tweeks so it will be able to collectd from nmon2graphite. You are free to modify the image to add the custom page, if desired.

The version of this script that I modified for Linux can be downloaded here:
https://hub.jazz.net/git/kuschel/monitorucd/contents/master/nmon2graphite

(This Linux version of this script does not work with the custom page found on the nmon2graphite home page without heavy modification.)

To get this script to automatically start and connect to graphite on port 2003 you need to make it start using a cron job. This is outlined in "Client Side" topic on nmon2graphite home page. I'll outline the basic steps here, this will need to be done on every agent machine so this would be a great candidate for an UrbanCode Deploy script step in a process that deploys this solution.

As root, make a directory to save the nmon2graphite script into. Make it root executable. I put mine in /opt/nmon2graphite/

mkdir /opt/nmon2graphite
chmod u+x /opt/nmon2graphite/nmon2graphite

Create a another directory in the directory in which you saved the script called "nmon":

mkdir /opt/nmon2graphite/nmon

As the root user, edit the crontab

crontab -e

You can also sudo it:


sudo crontab -u root -e

Add these lines to the end of it, change graphite host to the docker host, you can leave port 2003 unless you bound the docker container 2003 do a different host port.

0 0 * * * /usr/bin/mkfifo /opt/nmon2graphite/nmon/$(date +\%Y-\%m-\%d-\%H-\%M).$
0 0 * * *  sleep 10 ; /opt/nmon2graphite/nmon2graphite -i graphitehost -p 2003 -l $
0 1 * * * find /opt/nmon2graphite/nmon -type f -mtime +30 | xargs rm -f >/dev/n$


That's it, after this cron job executes nmon should now be feeding the graphite container. (You can also execute these command manually) There should be a section for nmon:



Tuesday, 11 August 2015

Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef! (Part 3: The Chef Cookbook)

In Part 2 we examined the UCD processes that provisioned the collectd cookbook onto the UrbanCode Deploy Agent hosts. In this blog post, we'll take a closer look at this cookbook.

When we use Chef to install collectd, we are using Chef "Solo" or local Chef. Local Chef assumes everything (ie. the Cookbooks) that it needs is available locally so it won't need to contact a Chef server to pull down Cookbook dependencies. You will see that the git repository contains not only the collectd Chef Cookbook but also the dependencies. The collectd cookbook I am using is a slightly modified version of this one: https://github.com/hectcastro/chef-collectd

When you execute chef in local mode, you supply in a configuration file; a node file. This node file contains all the recipes we want to execute as well as all the properties needed by those recipes.

We have two configuration files, one for collectd as a server and the other for when collectd installed as a client. The configuration files are created dynamically by UrbanCode Deploy as part of the generic process and are modified on-the-fly with some UrbanCode Process Request properties.

Recall in the Generic Process in Part 2 that there were two steps we glossed over:
Let's look at the collectd server step first, it creates the Chef node configuration file with the following contents:


{
  "run_list": [ "recipe[collectd::default]","recipe[collectd::attribute_driven]" ],
  "collectd": {
    "dir": "${p:collectd_dir}",
    "graphite_ipaddress": "graphite",
    "plugins": {
      "aggregation" : {
         "template" : "aggregation.conf.erb"
      },
      "cpu" : {
      },
      "disk" : {
      },
      "df" : {
        "config" : {
         "FSType" : [ "proc", "sysfs", "fusectl", "debugfs", "devtmpfs", "devpts", "tmpfs", "cgroup" ],
         "IgnoreSelected" : true
        }
      },
      "entropy" : {
      },
      "interface" : {
        "config" : { "Interface" : "lo", "IgnoreSelected" : true }
      },
      "irq" : {
      },
      "java" : {
        "template" : "${p:java_monitoring_template}"
      },
      "load" : {
      },
      "memory" : {
      },
      "network" : {
        "template" : "network.conf.erb",
        "config" : {
          "host" : "0.0.0.0",
          "listen" : {
              "SecurityLevel" : "Encrypt",
              "AuthFile" : "${p:collectd_dir}/etc/auth_file"
          }N
        }
      },
      "processes" : {
        "config" : {
         "ProcessMatch" : [ "UrbanCode Deploy Server\", \".*java.*UDeployServer",
                            "UrbanCode Deploy Agent Monitor\" \".*java.*air-monitor.jar.*",
                            "UrbanCode Deploy Agent Worker\" \".*java.*com.urbancode.air.agent.AgentWorker"]
        }
      },
      "swap" : {
      },
      "syslog" : {
        "config" : {
          "LogLevel" : "info"
        }
      },
      "tcpconns" : {
        "config" : {
          "ListeningPorts" : false,
          "LocalPort" : [ 7918, 8443, 8080, 43, 80]
        }
      },
      "users" : {
      },
      "write_graphite" : {
        "config" : {
      "Host" : "${p:collectd_server}",
      "Protocol" : "tcp",
      "LogSendErrors" : true,
          "Prefix" : "collectd.",
      "StoreRates" : true,
          "AlwaysAppendDS" : false,
          "EscapeCharacter": "_"
        }
      }
    }
  }
} 

Most of this JSON content is actually the collectd configuration, in particular the contents of the "collectd" object. The "run_list" array, contains the recipes we want to execute. One of them is the default collectd installation, collectd::default, this pulls down the collect source code and all the packages needed to build and run collectd. The other recipe is called collectd::attribute_driven which reads the "collectd" object and constructs the series of collectd configuration files. Notice that we substitute in UrbanCode Deploy properties into the file, the ${p:} tokens.

Once the collectd::attribute_driven recipe is complete the configuration looks something like this:

${p:collectd_dir}/etc
├── auth_file (generated by a previous step)
├── collectd.conf
└── conf.d
    ├── aggregation.conf
    ├── cpu.conf
    ├── df.conf
    ├── disk.conf
    ├── entropy.conf
    ├── interface.conf
    ├── irq.conf
    ├── java.conf
    ├── load.conf
    ├── memory.conf
    ├── network.conf
    ├── processes.conf
    ├── swap.conf
    ├── syslog.conf
    ├── tcpconns.conf
    ├── users.conf
    └── write_graphite.conf

Each object within the "collectd" object in the JSON file corresponding to a configuration file which is included as part of the global collectd configuration using an include attribute in the collectd.conf file. This file that is read directly by the collectd process.

Include "${p:collectd_dir}/etc/conf.d/*.conf"

Each configuration file pertains to the collectd plugin configuration specified by the object name. So, for instance, the "cpu" JSON object generated the cpu.conf file which contains the options for the collectd cpu plugin.

Each collectd plugin has its own configuration specification. The simple ones that are only one level deep are handled generically by the recipe using JSON attributes in the "config" object that are translated from JSON to the collectd configuration file. So taking the syslog plugin example:

      "syslog" : {
        "config" : {
          "LogLevel" : "info"
        }
      }

This creates a syslog.conf file with the following contents:

LoadPlugin "syslog"
<Plugin "syslog">  
  LogLevel "info
</Plugin>


The default configuration template also supports the write_graphite plugin,which is more then one level deep, but for anything more complicated then one level deep one has to supply a configuration template,in the "template" attribute.

One interesting template is the one for Java JMX. If you remember in Part 1, one of the attributes we pass into the Generic Process, is the Java template. This value gets passed into the configuration via ${p:java_monitoring_template} property binding.
You can examine the two templates now, the one for Java is here:

https://hub.jazz.net/git/kuschel/monitorucd/contents/master/cookbooks/collectd/templates/default/java.conf.erb

and the one for Tomcat here:

https://hub.jazz.net/git/kuschel/monitorucd/contents/master/cookbooks/collectd/templates/default/tomcat.conf.erb

The difference is that in one case, we monitor Tomcat JMX mbeans and in the other case we do not. Notice that there are also bind variables passed into the template:

Host "<%= node["collectd"]["name"] %>"

This way we can actually pass in UrbanCode Deploy properties directly into the templates.

There are a few other examples, aggregation.conf.erb, network.conf.erb and we also have one for mysql.conf.erb. If we wanted to include mysql monitoring for the server using the mysql plugin we could customize this template and add:

      "mysql" : {
        "template" : "mysql.conf.emb"
      }


Be aware that mysql monitoring requires that mysql be set to binary logging. In your my.cnf you'll need to add:

log-bin=/var/lib/mysql/log-bin.log
binlog_format=row

Templates can be very powerful, look at the network.conf.erb template as an example. It is able to generate the network configuration for both the collectd server and client based on attributes passed alone.

Definitely check other the collectd plugins, there is one for Oracle, hypervisors and a load of others. New templates can be created as needed and added to the cookbook.

Let's take a look at the client Chef node configuration file generated by the client step:

{
  "run_list": [ "recipe[collectd::default]","recipe[collectd::attribute_driven]" ],
  "collectd": {
    "dir": "${p:collectd_dir}", 
    "plugins": {
      "aggregation" : {
    "template" : "aggregation.conf.erb"
      },
      "cpu" : {
      },
      "disk" : {
      },
      "df" : {
        "config" : {
         "FSType" : [ "proc", "sysfs", "fusectl", "debugfs", "devtmpfs", "devpts", "tmpfs", "cgroup" ],
         "IgnoreSelected" : true
        }
      },
      "entropy" : {
      },
      "interface" : {
        "config" : { "Interface" : "lo", "IgnoreSelected" : true }
      },
      "irq" : {
      },
      "java" : {
    "template" : "${p:java_monitoring_template}"
      },
      "load" : {
      },
      "memory" : {
      },
      "network" : {
    "template" : "network.conf.erb",
     "config" : {
          "host" : "${p:collectd_server}",
      "server" : {
          "SecurityLevel" : "Encrypt",
              "Username" : "${p:collectd_username}",
              "Password" :"${p:collectd_password}"
      }
        }
      },
      "ping" : {
        "config" : {
          "Host" : "${p:collectd_server}"
        }
      },
      "processes" : {
        "config" : {
         "ProcessMatch" : [ "UrbanCode Deploy Server\", \".*java.*UDeployServer",
                            "UrbanCode Deploy Agent Monitor\" \".*java.*air-monitor.jar.*",
                            "UrbanCode Deploy Agent Worker\" \".*java.*com.urbancode.air.agent.AgentWorker"]
        }
      },
      "swap" : {
      },
      "syslog" : {
        "config" : {
          "LogLevel" : "info"
        }
      },
      "users" : {
      }
    }
  }
}
 
It's very similar to the server except we have a different network configuration, ,the collectd client connects to the collectd server, and we include the ping plugin and remove the tcpconns plugin. As the client does not write to graphite directly the write_graphite plugin section is also removed.

That's about it. Some other small modifications were made to the cookbook to upgrade the collectd version and install some JVM specific libraries into the system path. Feel free to check out the cookbook, pull it and modify it for your topology.

Next is Part 4, the fun stuff, metrics and visualization.

Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef! (Part 2: The UCD Process)

Following up to Part 1, in Part 2 I'll cover the Urban Code Deploy Process of deploying collectd. Actually, I'm going to throw in another process as well. If you were able to get Part 1 working, you'll notice that you can only execute the process on one agent at a time. Not cool, especially if you have thousands of agents. There has to be a better way! There is, but first, let's go over the per agent process. The input parameters for this process were described in Part 1.


The best way to go about this is to describe each step.


  1. Set Defaults: This sets up some in process defaults, well, just one for now the nullProperty. This is a property that is used in conditional forks to present an empty string. In other words, an unset property.
  2. component: This get the extended information of the component, in particular, the component id is required for steps further on down. This fetches it given an component name (passed in).
  3. IsVersionNameSet: This checks if the user passed in a specific version to deploy. If so it will skip to 6
  4. LatestVersion: Get the latest version from the component. This is a custom step from the ComponentPlus plugin I created. It takes a component name and returns a version name.
  5. versionName: This sets the request level version name (the one usually passed in) from step 4. Step 6 expects the request level parameter.
  6. version: Given the component name and version name get the version id. This is also a custom step I created using the ComponentPlus plugin.
  7. DownloadArtifacts: Download the artifacts from the component and version passed in. This fetches the Chef cookbook.
  8. Install Chef: A simple bash script that downloads and installs chef:

    echo $JAVA_HOME
    curl -L https://www.opscode.com/chef/install.sh | bash
    
    
  9. Server or Client: Are we installing a collectd Server or Client
  10. Create collectd Server Node: This creates a configuration file for the Chef recipe that is specific collectd server configuration. We will cover the contents of this in a later post.
  11. Create collectd config Directory: We create a directory to store the file created in the next step.
  12. Create auth file: A collectd server authentication file that contains the username password used by the server for collected clients to authenticate against. This file is in the htpasswd format. The username and password passed in as request properties are used to construct this file.
  13. Create collectd Client Node: This creates a configuration file for the Chef recipe that is specific collectd client configuration. We will cover the contents of this in a later post.
  14. Clean old Collectd: Clean up any old collectd configurations.
  15. Install Collectd: Execute the Chef recipe using the configuration file created in earlier steps.
  16. Deploy Server Set: Is the UCD Server request property set to something?
  17. Update UrbanCode Deploy Server JMX Settings: Update the UrbanCode Deploy server JVM so that remote JMX is enabled:

    if ! grep -q com.sun.management.jmxremote.port ${p:deploy_server_dir}/bin/set_env ; then sed -i 's/\(java.awt.headless=true\)/\1 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false/' ${p:deploy_server_dir}/bin/set_env
    fi
    
    
  18. Manual Task: The JMX settings required restarting of the UrbanCode Server. Restarting of an UrbanCode Server should be done using independent of the process, so it's a manual task.
  19. Get Agent Home: Fine the Agents home directory.
  20. Update Agent Worker JMX Settings: Update the UrbanCode Deploy agent worker jvm so that remote JMX is enabled.

    if ! grep -q com.sun.management.jmxremote ${p:AGENT_HOME}/bin/worker-args.conf ; then sed -i '/java.security.properties/a-Dcom.sun.management.jmxremote\n-Dcom.sun.management.jmxremote.port=9010\n-Dcom.sun.management.jmxremote.local.only=true\n-Dcom.sun.management.jmxremote.authenticate=false\n-Dcom.sun.management.jmxremote.ssl=false' ${p:AGENT_HOME}/bin/worker-args.conf
    fi
    
    
  21. Restart Agent: he JMX settings required restarting of the UrbanCode Agent.

    ${p:AGENT_HOME}/bin/agent stop ; ${p:AGENT_HOME}/bin/agent start
That's it! Most of the heavy lifting of installing and configuring chefd is done by Chef which we will cover in a later blog post.
As mentioned earlier, this only executed on an per agent basis, perhaps it would be good idea to get it execute for multiple agents. To do this we need to set up an UrbanCode application with an application process.

This is self explanatory, for each agent assigned to the application, run the generic process to install the collectd client. You can download the application with this process called Install Collected Client On Agents from:
https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Install_collectd_app.json

There are a few gotchas with this application, there is an environment created called Collectd, this environment will need to be bound to the Resource group that contains the agents to be provisioned with collectd. By default, this is the /Agents/Collectd resource group. I have configured this resource group to automatically include agents that have the deploy_collectd property set to true. This allows the inclusion and exclusion of agents into the deploy process.


One thing I also ran into is when there are duplicate properties in agents. I had two agent properties differing only in case: agent.HOSTNAME and agent.Hostname. This caused problems with the agent loop step in the process. You will see the process fail and the link to the child process non-existant in the request history and the UrbanCode Server deployserver.out log file with something like.

2015-08-11 08:52:30,860 ERROR WorkflowRuntime-7: (wfid=1b143582-e0a3-4a20-9150-d4a7fcc82420) org.hibernate.util.JDBCExceptionReporter - Duplicate entry 'iteration/agent/Hostname-611b38dc-cc5d-4342-828c-6a94cc23b881' for key 'ps_prop_val_uci'

Delete the duplicate property to get this unstuck.

Update: I have created a custom plugin that allows a property to be deleted from all agents. I use this to delete the HOSTNAME property.  I created another generic process called "Remove HOSTNAME Property" with this one step configured with the HOSTNAME property. I supply any resource as the default required Resource request parameter, as this is not used, you can put any valid resource here.
I then added a "Run Generic Process" step into the application process before the "For Every Agent..." loop that points to the Remove HOSTNAME Property, if you supplied a default in the generic process, set the Resource Path parameter to blank for this step. That will fix the  "Duplicate entry" exception.

If you run and it fails complaining about a missing resource id, then try putting in the Resource Path that the environment is bound to in the Install collectd step in the application process. For example:
/Collectd Enabled/${p:iteration/agent.name}

Once the resource group is bound to the Collectd environment and there are agents in it, execute the environment's application process:

This will bring up a dialog containing similar properties as the generic process, in this case we are installing clients, so properties pertaining to a collectd server install are omitted.

To recap:
  • Only Changed Versions: This is ignored as we have our own version logic in the process.
  • Snapshot: Leave Blank
  • Component Name: Should be set to the name of the component.
  • Version Name (Optional): You can specify the name of a specific version of the component to use, otherwise it will use the latest
  • Collectd Install Directory: The default is good
  • Collectd Username: You can leave this as default. This username is the one used to encrypt traffice between collectd clients and servers.
  • Collectd Password: Set any password. This password is the one used to encrypt traffic between collectd clients and servers. It's a good idea  to encrypt this password with htpasswd utility before pasting it here. For example to set the admin password, the first parameter is the username, the second is the password. The output contains the username, a colon, then the encrypted password. Paste that value in this property:

    > htpasswd -bnm admin admin
    admin:$apr1$qSfx7.W2$xf/2k1mDHnksPXZlrU.b90
    
    
  • Collectd Server: The collectd server host.
  • Schedule Deployment: Leave this unchecked if you want to install now
  • Description (Optional): Describe the execution of this install
That's it!

Part 3 is next where I examine the Chef recipe in more detail.

Monday, 10 August 2015

Monitoring UrbanCode Deployments with Docker, Graphite, Grafana, collectd and Chef! (Part 1)

Monitoring an UrbanCode Deploy server (sometimes more in HA setups) and it's agents requires keeping track of resource utilization multiple environments, the UrbanCode deployment server(s) itself and the linkages between (ie. the network).

Typical resources include:

  • CPU
  • Memory
  • Java Heap
  • Threads
  • Disk
  • Network
  • Virtual environment (hypervisor)

In addition to resource utilization, log files should also be monitored for abnormal activity and traffic. There are commercial offerings which do these types of things but since UrbanCode Deploy itself is a deployment solution, it can be used to deliver monitoring to nodes. All that's is needed is monitoring agents and a collector and a means to configure and connect it all together.

In this post I'll demonstrate a quick bootstrap solution for system and JVM resource monitoring using UrbanCode Deploy. It will provide an "out of the box" monitoring dashboard solution, Grafana from data stored in Graphite (running in a Docker container) of metrics collected by collectd that installed on nodes using a Chef recipe that's deployed through UrbanCode Deploy. The end result looking something like this:

Fig. 1 Monitoring Topology
Fig. 1 Monitoring Topology

For the time being this solution is solely for a Linux environments (RHEL, Ubuntu and variants) but this solution can be adapted to other OS's as many of the components have counterparts for Windows, AIX and other OS's.

So how do we get there? Well, one approach is to set it up manually, quite an operation if you have 1000s of agents, so we'll need to do better.

First, the assets need to be installed.

Fig. 2 Installing the Solution
We will need:

  1. An UrbanCode server with a few agents. You'll also need to install the chef plugin from here: https://developer.ibm.com/urbancode/plugin/chef
  2. I also created a plugin with groovy that adds 2 additional steps for components. One step gets the latest version for a component, and the other step gets an ID for a version in the component. You can see the source code here, it's a good example of how to create a custom plugin. It's quite simple.
    Plugin:
    http://www.boriskuschel.com/downloads/ComponentPlus.zip
    Source
  3. Import a component from IBM BlueMix DevOps Service Git found here:
    https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Collectd+Chef+Cookbook.json
    Import it from the Components tab:

    You should now see it listed:


    The component is preconfigured to connect to 
    IBM BlueMix DevOps Service Git and pull the recipe periodically and create a new version, you may change this behaviour in Basic Settings by unchecking the Import Versions Automatically setting.
    All you need to do now is supply a BlueMix username and password in the component properties page. You may need to enter a jazz.net username, if applicable (without a domain).

  4. Now you need to import a generic process (the top level Processes tab. Not the component!) that will be used to deploy the latest version of the component deployment package onto agent nodes. This process is kept in IBM BlueMix DevOps Services's Git and can be found here: https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Install_collectd.json


    Or, you can quickly import this into UrbanCode Deploy by using curl:
    curl -k -X POST -F file=@Install_collectd.json https://<user>:<pass>@<ucd host>/rest/process/import
    
    NOTE: I noticed that after importing the generic template the versionName step in the Generic Import_collectd process design (design tab) had three bullets "•", this needs to be updated to ensure that the Secure Property Value field is blank. If it's not, the fetching of the latest version will fail when version is not specified.
    
  5. We need a metrics collector to store the metrics and a graphing engine to visualize them. We'll be using a Docker image of Graphite/Grafana I put together. You will need to ability to build run a docker container either using boot2docker or the native support available in Linux
    I have put the image up on the public docker registry as bkuschel/graphite-grafana but you can also build it from the Dockerfile in IBM BlueMix DebOps Services's Git at https://hub.jazz.net/git/kuschel/monitorucd/contents/master/Dockerfile
  6. To get the image run:

    docker pull bkuschel/graphite-grafana

    Now run the image and bind the ports 80 and 2003 from the docker container to the hosts ports.

    docker run -p 80:80 -p 2003:2003 -t bkuschel/graphite-grafana

    You can also mount file volumes to the container that contains the collector's database, if you wish that to be persisted. Each time you restart the container, it contains a fresh database. This has its advantages for testing. You can also specify other configurations beyond what are provided as defaults. Look at the Dockerfile for the volumes.

Once the solution is installed all that needs to be done is to execute the process on UrbanCode. Yes, it's that easy.

Go to the Process Tab in UrbanCode Deploy Server, Click on Run Next to the "Install_collectd" process.

A dialog will popup asking for a series of parameters. These will be explained in more depth in a later post regarding the Chef recipe I created. (You can find it here)
  • Component Name: Should be set to the name of the component we imported earlier
  • Version Name (Optional): You can specify the name of a specific version of the component to use, otherwise it will use the latest
  • Is this a collectd Server?: If you look at Fig. 1, you'll see that many collectd clients connect to a central collectd server. If this node is the central collectd, this should be checked. Generally, this should be the main agent in the UrbanCode Server, usually co-located with the server.
  • Collectd Install Directory: The default is good
  • Collectd Username: You can leave this as default. This username is the one used to encrypt traffice between collectd clients and servers.
  • Collectd Password: Set any password. This password is the one used to encrypt traffic between collectd clients and servers. It's a good idea  to encrypt this password with htpasswd utility before pasting it here. For example to set the admin password, the first parameter is the username, the second is the password. The output contains the username, a colon, then the encrypted password. Paste that value in this property:

    > htpasswd -bnm admin admin
    admin:$apr1$qSfx7.W2$xf/2k1mDHnksPXZlrU.b90
    
    
  • Collectd Server (client)/Graphite host (server): if "Is this a collectd Server?" is checked then this is the graphite server host, the host that is running the docker container. Otherwise this the collectd server host.
  • UCD Server: The installation directory of the UrbanCode server (ex. /opt/ibm_ucd/server) if this collectd is to be installed on a node with a server.
  • Java Monitoring Template: If UCD Server is set and is installed on tomcat, select tomcat.conf.erb, otherwise select java.conf.erb.
  • Resource: Select the agent that this process should be executed on. (the host.

Once you click Submit, this should happen:
Deploy Process
Fig 3. Deploy Process
At this point, all the collectd daemons should be started and collecting. Navigate to the docker host at http://<Docker host>/. You should see a tree with metrics, like this:

You can also navigate to Grafana at http://<Docker Host>/grafana. Note that the username and password for both Graphite and Grafana are admin/admin.

This is quite a mouthful for one blog post and there are so many aspects to cover such as:
  • The UrbanCode Deploy Process, how does it work?
  • The Chef Recipe.
  • Collectd Collection Options (and the nmon option!).
  • How to create useful graphs in Graphite and dashboards in Grafana.
I will cover these in subsequent postings. In the meantime, try to set it up and see how it goes. If you're lucky, you end up playing with some cool metrics and graphs in Graphite/Grafana.