Vidar Hokstad V2.0

Home Blog

2008-03-29 11:50 UTC Latest referrers using Rack and Ruby

Posted in: , ,
As most bloggers I like to keep an eye on where my traffic is coming from, and especially when there are surges in traffic. I'm using both Google Analytics and Feedburner for stats, and it works great for trends, but not see what's happening right now.

This morning I needed a distraction and figured I'd just throw together a quick and dirty Rack middleware class to keep track of the latest referrers.

What I ended up doing was keeping a rolling buffer in an array that holds the last N referrers, and generate a histogram from that as needed. I'm not interested in accuracy, since I have the logs + Google Analytics + Feedburner to get the daily totals, so I didn't bother persisting the buffer to disk or anything - if I restart my app the stats will reset. This is just to get a live image of what's going on right now.

The downside of that is that this approach does not scale beyond a single process. If you want it to, you really do want to persist the data to a database or something, though adds a lot of overhead. Maybe I'll do that next - it's easy, but until my blog has a lot more traffic I don't really have the motivation.

Here's the class (yes, I know referrer is misspelled, but it matches the HTTP header):

module LatestReferers
  class Gather    def initialize app, opts = {}
      @app = app
      @referers = []
      @limit = 100
      @exclude = []
      opts.each do |k,v|
        @limit = v.to_i if k == :limit
        @exclude = v if k == :exclude
      end
    end

def call env ref = env["HTTP_REFERER"] || "-" req = env["REQUEST_URI"]

if !@exclude.detect{|pat| req =~ pat || ref =~ pat } @referers << [ref,req] @referers.shift if @referers.size > @limit end

env["hokstad.latestreferers"] = self @app.call(env) end

def histogram h = {} @referers.each do |ref,req| h[ref] ||= {:total => 0} h[ref][req] ||= 0 h[ref][req] += 1 h[ref][:total] += 1 end h.sort_by{|ref,pages| -pages[:total]} end end end

In turn:

  • #initialize takes the next app and a hash of options. Currently it recognized :limit, which controls how many referrers it will track, and :exclude which takes an array of regexp's to check against both the request uri and referrer for patterns to reject - I'm not interested in local referrals internally on my site, or referrals to the page I use to view the referral stats.
  • #call just gets the fields, checks them against the patterns, and i they don't match, it adds the referrer and page to the end of the array and removes the first if it exceeds the limit, to create a FIFO queue.
  • #histogram creates a sorted hash of hashes mapping a referrer to page names and the number of times each page has been accessed, plus a total.

#call passes the object on in the environment. I do this to reduce coupling - you can then choose to render the page in the framework of your choice if it has a rack adapter and allow you access to the environment, using a simple Rack middleware adapter such as the one I'll show below, or writing your own. Since it depends only on Rack, you can put this in front of most Ruby frameworks, including Rails if you so choose.

The class above can be plugged in by requiring the file you put it in, and adding something like this to your config.ru file if you use Rackup, or by adding the class to whatever Rack setup you use:

  use LatestReferers::Gather, {:exclude => [ /\/referers/, /http:\/\/www\.hokstad\.com/, /\.xml/, /\/feed/, /\.rdf/ ]}

The above is the config I use for this site.

If you just want a simple table of the results, you can use something like this. I just want the numbers, I don't care how the page looks:

module LatestReferers
  class View
    def initialize app, page
      @app = app
      @page = page
    end

def show(ref) return Rack::Response.new("Missing 'latestreferers' object",500).finish if !ref r = Rack::Response.new r.write("<html><head/><body>") r.write("<table border='1'><tr><th>Referer</th><th>Pages</th></tr>\n") ref.histogram.each do |k,v| r.write("<tr><td>#{k}</td> <td><table>") total = 0 v.sort_by{|page,count| -count}.each do |page,count| r.write("<tr><td>#{count}</td><td>#{page.to_s}</td></tr>") total += count end r.write("</table></td></tr>\n") end r.write("</table></body></html>") r.finish end

def call env if env["REQUEST_URI"] == @page show(env["hokstad.latestreferers"]) else @app.call(env) end end end end

That serves as a simple example of using Rack::Response too - it's completely optional, and you can stream out any template from your favorite templating system instead of hardcoding the HTML, but for this I just wanted something with no other external dependencies than Rack.

There's probably a lot of things I could do to the view code, but it's a throwaway hack - I just want to be able to see at a glance if anything interesting is happening. If you want a pretty page, it's easy enough to use the above as a starting point.

You can see the live result of using the above classes here with this config (expect it to be reset quite often, and I only track the last 100, so don't expect it to show a huge list):

use LatestReferers::Gather, {:exclude => [ /\/referers/, /http:\/\/www\.hokstad\.com/, /\.xml/, /\/feed/, /\.rdf/ ]}
use LatestReferers::View, "/referers"

No comments yet - Be the first one!

Post a Comment

Basic HTML allowed.

About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Recent posts to my blog

Categories

StumbleUpon My link page

(Links I have stumbled and like)