Vidar Hokstad V2.0

Home Blog

Tag: rack

2008-03-29 11:50 UTC Latest referrers using Rack and Ruby

Posted in: , ,
As most bloggers I like to keep an eye on where my traffic is coming from, and especially when there are surges in traffic. I'm using both Google Analytics and Feedburner for stats, and it works great for trends, but not see what's happening right now.

This morning I needed a distraction and figured I'd just throw together a quick and dirty Rack middleware class to keep track of the latest referrers.

What I ended up doing was keeping a rolling buffer in an array that holds the last N referrers, and generate a histogram from that as needed. I'm not interested in accuracy, since I have the logs + Google Analytics + Feedburner to get the daily totals, so I didn't bother persisting the buffer to disk or anything - if I restart my app the stats will reset. This is just to get a live image of what's going on right now.

The downside of that is that this approach does not scale beyond a single process. If you want it to, you really do want to persist the data to a database or something, though adds a lot of overhead. Maybe I'll do that next - it's easy, but until my blog has a lot more traffic I don't really have the motivation.

Here's the class (yes, I know referrer is misspelled, but it matches the HTTP header):

module LatestReferers
  class Gather    def initialize app, opts = {}
      @app = app
      @referers = []
      @limit = 100
      @exclude = []
      opts.each do |k,v|
        @limit = v.to_i if k == :limit
        @exclude = v if k == :exclude
      end
    end

def call env ref = env["HTTP_REFERER"] || "-" req = env["REQUEST_URI"]

if !@exclude.detect{|pat| req =~ pat || ref =~ pat } @referers << [ref,req] @referers.shift if @referers.size > @limit end

env["hokstad.latestreferers"] = self @app.call(env) end

def histogram h = {} @referers.each do |ref,req| h[ref] ||= {:total => 0} h[ref][req] ||= 0 h[ref][req] += 1 h[ref][:total] += 1 end h.sort_by{|ref,pages| -pages[:total]} end end end

In turn:

  • #initialize takes the next app and a hash of options. Currently it recognized :limit, which controls how many referrers it will track, and :exclude which takes an array of regexp's to check against both the request uri and referrer for patterns to reject - I'm not interested in local referrals internally on my site, or referrals to the page I use to view the referral stats.
  • #call just gets the fields, checks them against the patterns, and i they don't match, it adds the referrer and page to the end of the array and removes the first if it exceeds the limit, to create a FIFO queue.
  • #histogram creates a sorted hash of hashes mapping a referrer to page names and the number of times each page has been accessed, plus a total.

#call passes the object on in the environment. I do this to reduce coupling - you can then choose to render the page in the framework of your choice if it has a rack adapter and allow you access to the environment, using a simple Rack middleware adapter such as the one I'll show below, or writing your own. Since it depends only on Rack, you can put this in front of most Ruby frameworks, including Rails if you so choose.

The class above can be plugged in by requiring the file you put it in, and adding something like this to your config.ru file if you use Rackup, or by adding the class to whatever Rack setup you use:

  use LatestReferers::Gather, {:exclude => [ /\/referers/, /http:\/\/www\.hokstad\.com/, /\.xml/, /\/feed/, /\.rdf/ ]}

The above is the config I use for this site.

If you just want a simple table of the results, you can use something like this. I just want the numbers, I don't care how the page looks:

module LatestReferers
  class View
    def initialize app, page
      @app = app
      @page = page
    end

def show(ref) return Rack::Response.new("Missing 'latestreferers' object",500).finish if !ref r = Rack::Response.new r.write("<html><head/><body>") r.write("<table border='1'><tr><th>Referer</th><th>Pages</th></tr>\n") ref.histogram.each do |k,v| r.write("<tr><td>#{k}</td> <td><table>") total = 0 v.sort_by{|page,count| -count}.each do |page,count| r.write("<tr><td>#{count}</td><td>#{page.to_s}</td></tr>") total += count end r.write("</table></td></tr>\n") end r.write("</table></body></html>") r.finish end

def call env if env["REQUEST_URI"] == @page show(env["hokstad.latestreferers"]) else @app.call(env) end end end end

That serves as a simple example of using Rack::Response too - it's completely optional, and you can stream out any template from your favorite templating system instead of hardcoding the HTML, but for this I just wanted something with no other external dependencies than Rack.

There's probably a lot of things I could do to the view code, but it's a throwaway hack - I just want to be able to see at a glance if anything interesting is happening. If you want a pretty page, it's easy enough to use the above as a starting point.

You can see the live result of using the above classes here with this config (expect it to be reset quite often, and I only track the last 100, so don't expect it to show a huge list):

use LatestReferers::Gather, {:exclude => [ /\/referers/, /http:\/\/www\.hokstad\.com/, /\.xml/, /\/feed/, /\.rdf/ ]}
use LatestReferers::View, "/referers"

2008-03-28 13:46 UTC Why coupling is always bad / Cohesion vs. coupling

In the discussion following my entry "Why Rails is total overkill and why I love Rack" several comments raised the issue of whether high coupling is always bad. My answer was that I believe it is, but at the same time it can be worth it sometimes.

It seems like a point that is worth further discussion. I'm not going to go into a terrible amount of detail, as I enjoy the discussion more than expounding on a subject that should be relatively uncontroversial.

What do I mean by coupling and cohesion

My earlier entry linked to the Wikipedia articles for these terms, because I was sure some people would misunderstand, and sure enough. So lets go into some more detail:

Two components are loosely coupled, when changes in one never or rarely necessitate a change in the other

Changes that affect external interfaces will of course require changes, and so you can't completely safeguard against changes causing ripples. You can protect against it by narrowing the interface. This is why coupling and cohesion is so tightly related:

A component exhibits high cohesion when all its functions/methods are strongly related in terms of function.

The higher cohesion and lower coupling a system has, in general the more its components exhibit strong data hiding, narrow but general interfaces and a high degree of flexibility.

Why coupling is always bad

Surely increasing dependencies on implementation details of other components isn't a good thing?

The objections I've seen typically doesn't actually usually imply that coupling is good, though, but that coupling isn't always bad because it's necessary to achieve high cohesion.

Some evils are necessary, but that doesn't make them good. I will not try to argue that increasing coupling isn't sometimes worth it - see below.

Coupling is always bad because it prevents the replacement or changes of components independently of the whole. It's hard seeing a defense against this, and indeed hard to argue for it because it appears so self evident.

What are some of the consequences of high coupling?

  • Developers / maintenance programmers need to understand potentially the whole system to be able to safely modify a single component.
  • Changing requirements that affect the suitability of some component will potentially require wide ranging changes in order to accommodate a more suitable replacement component.
  • More thought need to go into choices at the beginning of the lifetime of a software system in order to attempt to predict the long term requirements of the system because changes are more expensive.

I can't think of a single benefit of high coupling in and of itself. If anyone think they can actually defend why high coupling might sometimes be good (as opposed to just occasionally a necessary evil), I'd love you to post your comments to this post...

Cohesion vs. coupling, and why coupling is sometimes worth the cost

Cohesion is about making sure each component does one thing and does it well. The lines get blurry in a language like Ruby, where one "component" could be a library that reopens a class like Object and in effect extends every object in the system. The specifics doesn't really matter. What matters is whether the code is self contained.

It's generally easier to reduce coupling in a highly cohesive system.

It is easier, because a highly cohesive system will group the related functionality together, so that the need to communicate across component boundaries (whether those "components" are classes, separate processes, or methods injected into reopened classes by a library) is reduced.

The key point is that related code often share state. Sharing state across component boundaries increases dependencies. Increased dependencies increase coupling.

Cohesion and coupling are thus not at odds - high cohesion and low coupling are both good, and achieving one tends to make achieving the other easier, not harder. When some people think that high coupling is sometimes excusable, it is often because they confuse cohesion with consistency and ease of use.

I am sure there are many different ideas of what the appropriate tradeoff is. I put the bar pretty high (that is not to say that I don't sometimes violate my own ideals out of laziness, but then again I've been bitten by that several times too)

What can make increased coupling worth the cost

Sometimes a system is simply so large and complex that even if most of your components are highly cohesive you need to break the components into pieces, and possibly need to be able to plug other code into some of those pieces, to make the system maintainable.

In those cases, there may not be a choice. You may need to scale a system across server boundaries and have to break it into server specific components. Each processing step may need access to and knowledge of the full state to be able to continue processing no matter how you try to slice and dice the tasks.

Another case where increased coupling may be worth the cost is ease of use. A few days ago I wrote a post title URLs do not belong in the views. One of the approaches I was pondering was to put the routing/dispatch mechanism (the front controller) in charge of generating the URLs. At the same time I wanted to tie the url generation to model instances, not to named routes as Rails for example does (Rails also supports generating routes from model class names, but that's also not what I wanted).

Part of the motivation is an observation that there are many ways to generate URLs from the model objects - my posts for example, have a "slug" used to generate SEO friendly URLs, but that isn't guaranteed to be stable, and certainly isn't until the post is published, so while that is the right URL for a published, public view of the post, it's not appropriate for the admin interface, where one of the operations is to change the slug - I want the admin URLs to stay static. In this case the appropriate URL to use requires knowledge of the contents of the model. It's perfectly appropriate for the view to request data from the model, but I don't want it to make assumptions about formatting of that data.

And wouldn't it be nice if the front controller could instantiate the proper model objects too?

The point isn't what Rails can or cannot do - in this case Rails certainly can do more that a lot of frameworks I've seen, and gets part way there. If you are willing to sacrifice low coupling, allowing the front controller to create a mapping is pretty straight forward, and it certainly would be trivial to make Rails support a model like that (if it doesn't already - I don't know).

Doing those things without causing a scenario where the front controller knows about the way specific models are built (i.e. how to instantiate objects with a specific ORM), or where the views depend on a specific API of a specific front controller implementation is more work.

There are many cases where lower coupling means more work

If you, for your specific use, couldn't care less about the extra coupling because you know you'll never need to exchange a specific component (do you really know? Think long and hard about that), and the benefits in terms of additional work starts being significant, then lowering the bar and accepting higher coupling may be worth it. It's a tradeoff between the increased cost of replacing a component vs. the potentially lower cost of using the component in the first place.

My goal isn't to convince people to always strive for minimal coupling, but to make at least a few people at the very least think twice and make sure they really need to before they start adding extra dependencies to their code.

To relate this to my previous post, have Rails gotten the balance right? In my opinion it hasn't. That's not to say everything in Rails could be cut into independent reusable components without sacrificing usability.

Some thoughts on avoiding coupling

Rack is a good example. Read the Rack specification. Seriously. It's short.

There's two good things about it:

First of all t's easy to implement Rack again, or parts of it, if you really have to. If for whatever reason the current implementation doesn't meet your needs, it's easy to satisfy the requirements of the specifcation.

Secondly, it's even easier for other components to plug into the Rack infrastructure. Really, a minimal piece of Rack middleware doesn't need to do much more than this (it doesn't technically need even this, as long as it responds to #call, but doing it this way lets you chain them trivially using Rackup config files)

class RackMiddlewareExample
  def initialize app
     @app = app
  end

def call env @app.call(env) end end

Of course your middleware can (and likely will) access the environment provided to #call, but that interface is not doing much more than passing on data passed in with the request and some information about the server you're running in, just like the CGI environment. As long as you obey with the very simple Rack specification, you can build up complex behavior by layering a number of tiny classes than can be ripped out, reordered, replaced, rewritten etc. as you please.

It's an incredibly powerful model because of the low coupling. Of course, it's easy to break that by adding lots of data to the environment you pass on. It's not an automatic truth that Rack middleware components will not have high coupling, but it would kind of defeat the purpose

A few general rules to avoid high coupling:

  • Make your components as cohesive as possible. If they have more than one responsibility, try to break them in two. Identify what their responsibilities actually are.
  • Don't leak state when you don't have to. WHY are specific attributes exposed? Do they have to? Do you need to tell the world which state an object is in, or is it enough to tell the world that the object is or is not in a specific state? The more you hide data, the harder it is to accidentally increase coupling.
  • Simplify your interfaces. Can you easily reimplement a class from scratch that satisfy the interfaces? What ARE the interfaces that other components are allowed to depend on? (AND note that interfaces can be complex even if the number of methods are low, if the data passed as arguments is complex)
  • Pick interfaces that are already satisfied by components consumers of your interfaces might use. An example of this is again Rack, where the choice of using #call means that a Proc can be used to satisfy the interface requirements. It's a tiny thing in this case, but it does increase flexibility, and makes reimplementing or replacing components, or providing a facade or decorator around an existing component that much easier.

Why do you hate Rails?

Judging by some of the feedback I got, some people clearly think I hate Rails. I don't, which I hope my answers to comments etc. reflected. Rails has done a lot of really great things for Ruby and web development, and it deserves full credit for that.

I do stand by my assessment that I believe Rails is overkill, though. That doesn't mean none of the code in Rails is worthwhile - lots of it clearly is. But I do also strongly believe that Rails would be far better if it was more loosely coupled, making it easier both for alternative implementations of core components to be easily used, and for bits and pieces of Rails to be used by itself. The success of ActiveRecord is a testament to the value of being able to reuse chunks of code originating in Rails, and I'm sure there's lots of other code that would benefit a wider community.

A lot of my reluctance to use Rails boils down to the fact that I prefer to pick components that fit with what I want to do rather than adapting what I want to do to how it'd be easy to do it with a specific framework. I want the flexibility to throw out components when they don't suit me without affecting other parts of my applications.

For other people that's less of a concern, and so they are happy with Rails and want to keep using it, and that's of course their right. Choice is great. Some people are happy with PHP or even ASP too. If it works for them, then that's fine. Switching from something that works perfectly fine for you just to switch is rarely a good idea, and I'd never advocate it.


2008-03-23 11:03 UTC Why Rails is total overkill and why I love Rack

Rails is total overkill. It tries to do "everything" in a massive framework where major components are tightly intervowen. Smaller frameworks like Merb and Camping have already shown you don't NEED this. I argue you don't need a framework at all - you need highly cohesive, loosely coupled components. That is why I love Rack - it does one thing and does it well, and leave me to write applications, not learn frameworks.

Why I love Rack

Rack is beautiful in it's simplicity. I've posted a couple of Rack middleware classes that shows how simple it is to extend. (check out my Rack tag for more)

It struck a chord with me because what it tries to do is very limited; it's simple; its model encourages layering and compositions of cohesive individual component as opposed to what "frameworks" usually does.

Rack is not a framework. If you want a framework you can plug it into Rack. If you don't, you can write to the bare Rack environment.

"Framework" is to me an euphemism for "Gordian knot of interdependencies". It doesn't have to be that way, but it ends up being that way more often than not. If a framework is engineered explicitly to be just a collection of libraries that share minimal interdependencies and are useful by themselves, then I have no beef with it.

While there are many common needs for various web apps, they are however largely orthogonal, and can easily be supported via independent, highly cohesive libraries. This is the case for most domains where people promote frameworks. Let me give some examples related to web apps:

  • Sessions: The basic requirement for session handling is the ability to persist objects or pieces of data, and to tie that persistence to a key extracted from a cookie or CGI parameter. There's no reason for this to not consist of a self-contained module to handle the persistence, and a tiny adapter from the CGI environment (whether provided by Rack, the CGI class, or something else) to the persistence code. Why intermingle it with "framework" code?
  • Database handling: Ok, so Rails too does the right thing here (mostly) and ActiveRecord can indeed easily be used outside of Rails. There's just no reason for the ORM or other database code to be tied to a framework. But Rails tie you pretty intimately to ActiveRecord. If you buy into Rails, the assumption is you'll use ActiveRecord. Which sucks if you don't like ActiveRecord or it's not suitable for your application.
  • Request processing: Exactly what is it people want here? For me request processing involves parsing the CGI environment and presenting it nicely packaged up. Possibly handling uploads. Some escaping/unescaping. Rack::Request handles all I need there.
  • Rendering/views: The plethora of different rendering libraries generating HTML output demonstrates quite clearly this can be done separately.
  • Routing/dispatching: It's at most a Hash mapping regexp's to classes and/or method calls. Hardly an earth shattering invention. The routing class I use for this blog is about 40 lines, and it's only that big because it's full of bells and whistles. It also has four and only four requirements of the outside world: You must pass it path_info, request_method and an object that will be passed on to the classes it dispatches to; the classes you want to route to must know how to respond to methods corresponding to the values of request_method OR call (so a proc will work); the classes must know how to deal with the request object you pass in. Minimize coupling. Always. Even for trivial code.
  • Form handling: I've not seen many good approaches here. At Edgeio we put together a generic forms class that did go some way in reducing the typing by extracting the values needed from the request and generating the data needed by the view automatically. It's tens of lines of code at the most, and can be done with very few - if any - dependencies.

There are more bits and pieces, but common to them all is that no specific dependencies are needed - all of these components can, and have, been implemented in ways that makes it easy to plug them into whatever environment you have.

Why rails is total overkill

I can piece together all I need easily from libraries I like. I don't need (or want) Rails to dictate what I use:

IF I need sessions, there's a lot of libraries handling persistence, and tieing it into Rails is less than 10 lines of code, and it's less than 10 lines of code I write once and will reuse elsewhere (but generally I'm getting more and more negative to sessions - if you need them so often you can't just persist them in your database without worrying about performance, you have a code smell that makes scaling hard).

For ORM's I've picked Sequel for my blog, but there's a number of alternatives.

For request processing, just plain Rack::Request meets all my needs.

Rendering views? I rolled my own in about 30 lines of code that's sufficient for this blog - for something larger I'd pick one of the huge number of templating languages (up to and including XSL, which I've used in the past because it has some nice properties such as being able to feed the XML + XSL straight to a browser to let you see the raw data passed to the view in the browser with "show source" during debugging).

Routing/dispatching I as mentioned needed a whopping 40 lines or so for, and now I have a reusable component that will plug happily in if I ever need to replace Rack, or even if I replace everything else.

I'm sure any Rails fans reading this (all two of you, judging by my number of subscribers at this stage) will be fuming and be aching to complain about how many extra things Rails gives them. The problem is I don't need more. I've written web apps in many languages (including C++ - I kid you not) over the years, and the above are all the web specific components I've needed.

Beyond that there are certainly a large number of libraries that may be useful for specific types of apps. But none of them are specific to writing web applications. None of them need interdependencies with a web framework.

To sum it up: A difference in philosophy

Applications should be composed of components that show:

  • High cohesion: Each component should do one or a small number of things that are tightly related and do them well, rather than trying to do "everything".
  • Low coupling: You should be able to replace any component with another one without having to reimplement a ton of complex interfaces.

Rails fail miserably there, and that makes Rails overkill to me. This blog is as small as typical "demo" Rails apps, and yet it doesn't use any framework, just independent libraries such as Sequel for the models, Hpricot for assorted HTML mangling etc.

It's only coupling to Rack is that it expects something to call the "call" method of the routing class with something resembling a CGI environment. Changing it to use the CGI class is about 10 lines of code. Changing it to bypass both Rack and the CGI class and parse the bits it needs of it's own environment is perhaps another 20.

Half the code is a collection of tiny reusable components - some are Rack middleware whose only dependency on Rack is the calling convention (one method call and the expected format of the result), and some are things like the 40 line dispatches.

That's why Rails is overkill: You can easily build web applications without the "magic", and without the interdependencies and all the rest that comes with it.

Could some things in Rails be useful? Yes, there are a lot of useful things in Rails, but they could practically all have been done as independent components, and we'd be all the better for it, being able to pick and choose the pieces we need rather than dragging in a huge framework and tons of dependencies and all kinds of other baggage that I for one do not want.


2008-03-22 12:02 UTC Rack middleware: Adding cache headers

Posted in: , ,
I'm playing with a small web based RSS reader, and one of the things it does is cache a lot of data to reduce the impact on the sites I follow feeds from. However I realized this a couple of days ago I'd completely forgotten to set cache headers, and I kept hammering my own site.

That wouldn't do, and since my app is using Rack to interface to the webserver, there's a simple solution: Write a tiny Rack middleware class, just like I described in "Rewriting content types with Rack".

The great thing is that because of Rack this class can be used to add cache headers for anything from pure Rack based apps to apps using Rails, Merb, or any number of other Ruby frameworks that has Rack adapters. If you haven't looked into Rack already, do.

I added the following in a file named "cachesettings.rb":

class CacheSettings
  def initialize app, pat
    @app = app
    @pat = pat
  end

def call env res = @app.call(env) path = env["REQUEST_PATH"] @pat.each do |pattern,data| if path =~ pattern res[1]["Cache-Control"] = data[:cache_control] if data.has_key?(:cache_control) res[1]["Expires"] = (Time.now + data[:expires]).utc.rfc2822 if data.has_key?(:expires) return res end end res end end

It's pretty straightforward:

  • On setup it stores a hash of config information
  • On each call it first executes the next step in the chain.
  • Then it iterates over the hash and if there's a match it optionally adds Cache-Control: and Expires: headers. For the Expires: header the configuration info is the number of seconds from "now".

I then added it to my config.ru (the Rack config file for my app) like this:

require 'cachesettings'

use CacheSettings, { /\/static\// => { :cache_control => "max-age=86400, public", :expires => 86400 } } use Rack::Static, :urls => ["/static"]

That's all there's too it - it can add cache headers to any arbitrary results. It should be obvious how to extend this to add any arbitrary headers too, or rewrite existing headers.

Useful exercises for the reader: Extend it to allow adding a Last-Modified header or Etag's.


2008-03-19 06:45 UTC Rewriting content types with Rack

Posted in: , ,
UPDATE: I have just added another Rack entry: Rack Middleware: Adding cache headers

Rack is a common API between webservers and frameworks for Ruby. It allows all kinds of nice stuff, like chaining filters that each do one small and self-contained part of the processing and are easy to reuse.

Here's a trivial one I wrote to automatically set Content-Type based on the extension of a file:

# Rewrite content types based on file extensions
class RewriteContentType
  def initialize app, opts
    @app = app
    @map = opts
  end

def call env res = @app.call(env) ext = env["PATH_INFO"].split(".")[-1] res[1]["Content-Type"] = @map[ext] if @map.has_key?(ext) res end end

All it does is split off anything after the last "." and add a matching Content-Type to the headers if it exists in the map that's passed in. In my Rack config file, I do this:

use RewriteContentType, {"js" => "text/javascript"}


Older Entries

About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Categories

StumbleUpon My link page

(Links I have stumbled and like)