Tracking stats in your application

Inspired by Etsy's post on keeping track of everything, I set out to achieve similar functionality on a Rails application. Essentially, you run Graphite somewhere and send UDP packets from your application to StatsD. When you have Graphite and StatsD running, you can really get some awesome stats from your application and at the same time, since it's using UDP, the overhead is very minimal.

Getting your graphite on

I'm using Ubuntu 10.10 in this example, because it has a whisper package available, but you should be able to install this on any distro as all the sources are available.

# apt-get install -y bzr python-cairo-dev python-django python-twisted python-whisper libapache2-mod-wsgi

Once you have the necessary packages installed, you can easily grab the graphite source from launchpad, verify you have all the dependencies and finally install the software.

# bzr branch lp:graphite
# cd graphite
# ./check-dependenicies.py
...
# python setup.py install

Graphite installs itself to /opt/graphite, with plenty of example configurations. You will need to copy a few in place to get started.

# cd /opt/graphite/conf
# cp carbon.conf.example carbon.conf
# cp storage-schemas.conf.example storage-schemas.conf
# cp graphite.wsgi.example graphite.wsgi

I added the following to my storage-schemas.conf which provides some sane defaults for use with StatsD, provided by Etsy:

[stats]
priority = 110
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974

The Graphite source comes with an example virtual host definition for Apache under the examples directory. I simply copied this over to /etc/apache2/sites-available/graphite, but had to comment out the WSGCISocketPrefix line. After the virtual host definition is in place, you will need to set up the initial database, then change up some permissions so Apache can write to the necessary directories.

# cd /opt/graphite/webapp/graphite
# python manage.py syncdb
# chown -R www-data:www-data /opt/graphite/storage/
# a2ensite graphite
# service apache2 restart

The final step is to start up carbon which is graphite's data aggregator.

# cd /opt/graphite/
# ./bin/carbon-cache.py start

If all went well, you should now have graphite up and running at http://stats.example.com. At some point I really need to look into writing a Chef cookbook for this, but looks like dje has written one already. I haven't checked it out yet, but I'm sure it's at the very least a good starting point.

A simple node.js daemon: StatsD

You can either compile node.js from source, or grab it from a PPA. The version included in the Ubuntu repositories are too old to run StatsD.

# add-apt-repository ppa:richarvey/nodester
# apt-get update
# apt-get install nodejs

With the required dependencies in place, clone the repo and move it to a more suitable location.

# git clone https://github.com/etsy/statsd.git
# mv statsd /opt/
# mkdir /etc/statsd

Create your configuration file, in this case graphite is running on the same machine as statsd:

# cat /etc/statsd/config.js
{
  graphitePort: 2003
, graphiteHost: "localhost"
, port: 8125
}

So now you will probably want to use upstart to integrate the service properly in Ubuntu. Create a simple upstart service definition as follows:

# cat /etc/init/statsd.conf
description "statsd"
author      "rdio"

start on startup
stop on shutdown

script  
    export HOME="/root"
    exec sudo -u nobody nodejs /opt/statsd/stats.js /etc/statsd/config.js
end script

Now you can easily start statsd and verify it is running:

# start statsd
# status statsd
statsd start/running, process 22307
# netstat -nulp | grep nodejs
udp        0      0 0.0.0.0:8125            0.0.0.0:*                           22307/nodejs 

At this point you will be able to configure your application to send data at your StatsD instance. There's plenty of code out there for doing this in pretty much every language. In fact, when you cloned the StatsD repository, it came with PHP and Python examples. Below I'll show you how to implement a statsd client in a Rails application.

Some basic stats in a Rails application

To start out, you need to tell Rails you need the dawanda-statsd-client gem, so add the following in your environments.rb:

config.gem 'dawanda-statsd-client', :lib => 'statsd/client'

The default methods are pretty good, but getting your connection information in needs a bit of work so we override the config method in initializers/statsd.rb:

class Statsd
  class << self
    def config
      environments = YAML.load_file("#{Rails.root}/config/statsd-client.yml") || {}
      environments[Rails.env]
    end
  end
end

Next you'll want to keep your StatsD connection information in a separate yaml file. You can define different settings per the running Rails environment. Set up config/statsd-client.yml:

production:
  host: stats.example.com
  port: 8125

Now you're ready to increment any arbitrary stat at some place in your application:

Statsd.increment('user.logins')
Statsd.timing('user.upload', end_time - start_time) # time is tracked in ms

What stats to track?

Well that's entirely up to you, but ideally you would be tracking as much as humanly possible. Things that you don't even know will matter, may matter at some point, so having a history of stats will help you troubleshoot some issues. You could track every CRUD operation in your models, specific actions in your controllers, or even user page load times. You will definitely want to keep track of timings of any external API calls and If you're using Capistrano to do deployments, it's easy to add a simple increment every time you do a deployment. This will allow you to overlay your arbitrary stats with code deployments. This is awesome for seeing the effects of your modifications and can be invaluable for showing progress to your management types.

Tags: