Tag: howto

2009-05-23 16:04 UTC Family tree using Graphviz and Ruby

My dad spent a lot of time putting together a family database, currently containing about 12000 people covering both my parents ancestors as well as tracking forward to contain a lot of living descendants. Unfortunately, since he started this over 16-17 years ago it's been managed as a custom dBase III+ app, and the code grew by accretion over at least 7 years (until my father died). Spurred on by an e-mail from a possible distant relative (who it turns out I've even met) I finally dumped the dbf files into an Sqlite database and put together a few scripts to generate diagrams from it.

Here's an example (click to enlarge). The birth/death dates are in Norwegian format (day.month.year)

Ancestors of Ole Martin Hokstad

The full SVG diagram (which is zoomable in Safari and Firefox) is here.

This tree shows the known ancestors of Ole Martin Hokstad - the first person amongst my direct ancestors to be born to the Hokstad name (there are one or two other families where the name Hokstad was taken at different times). In our case the name stems from the Hogstad farms in Frosta, near Trondheim, Norway. The farms kept being divided as a result of children inheriting parts etc.. 
At one point the farm Lille-Hogstad was bought by Ola Viktil, and one of his grandsons, Peter Magnus Hokstad Johansen combined two smaller properties to Hogstad Lille Vestre in 1854, which was then renamed Hokstad (presumably he didn't like the thought of the name Peter Magnus Hogstad Lille Vestre Johansen). His children, including my great-grandfather Ole Martin Hokstad, got the name by birth.

The tree above doesn't show any siblings, and leaves out a few people we don't have any certain information about. I did render one of all my know ancestors as well, but it's too huge to be practical to reproduce here (about 20 times the size of the tree above).

To produce this I put together a very quick and dirty little Ruby script:

require 'model'
require 'set'

id = ARGV[0].to_i

# Prevent double inclusion of a node $memo = Set.new

def filter_node(per) return nil if !per return nil if per.firstname.strip == "?" || per.lastname.strip == "?" || per.maidenname.strip == "?" return per end

def node (per,color) return false if $memo.member?(per.pk) $memo << per.pk name = [per.firstname, per.middlename, per.lastname, per.maidenname] name = name.collect do |n| n && n != "" ? n : nil }.compact.join(" ") label = "#{name}\n#{per.birthdate} - #{per.deathdate}" puts " p#{per.pk} [ shape = box, style=\"filled\","+ " fillcolor=\"#{color.to_s}\", label=\"#{label}\" ];" return true end

def ancestors per

father = filter_node(per.father) mother = filter_node(per.mother)

pk = per.pk arrowhead = "normal" if mother and father merge = "m#{mother.pk}and#{father.pk}" if !$memo.member?(merge) puts " p#{merge} [ shape = point ]" puts " p#{merge} -> p#{pk} [ arrowtail=none ]" arrowhead = "none" else $memo << merge end pk = merge end

if father if node(father,:green) puts " p#{father.pk} -> p#{pk} [ arrowhead=#{arrowhead} ]" ancestors(father) end end

if mother if node(mother,:gold) puts " p#{mother.pk} -> p#{pk} [ arrowhead=#{arrowhead} ]" ancestors(mother) end end end

def graph per puts "digraph ancestors {" node(per,:red) ancestors(per) puts "}" end

per = Person[:id => id] if !per puts "Unable to find #{id}" exit end

graph(per)

I'm not going to spend a lot of time going through the script, other than to point out the dependencies if you want to try this for yourself:

You need to create a class with the methods #pk that returns a unique key suitable to be part of a Graphviz dot-file node name, #father and #mother that returns an equivalent object for the father and mother respectively or nil if not known, and methods #firstname, #middlename, #lastname and #maidenname respectively that returns the names as strings. Whether it comes from a database or not is irrelevant - you can load it all into memory first if you like. In my case it's all from a Sequel model, as you can see I retrieve a Person object for the id provided as the root of the tree at the end of the script. 
I don't think I'll put in much effort to make this a generic package, but it should be easy enough to adapt if you know some Ruby. I will probably post a couple of variations to add output of siblings and also to generate an equivalent one for descendants instead of ancestors though.

I then use this little bash script to generate the SVG file (requires xsltproc)

#!/bin/sh

ruby ancestors.rb $1 >/tmp/$1.dot dot -Tsvg /tmp/$1.dot >/tmp/$1.svg xsltproc /opt/diagram-tools/notugly.xsl /tmp/$1.svg >$2

This assumes my diagram-tools GIT repository has been cloned into /opt/diagram-tools (git clone git://github.com/vidarh/diagram-tools.git /opt/diagram-tools), to pretty up the Graphviz output.





2009-04-20 22:30 UTC Updated Graphviz tools on Github

I just added a new repository on GitHub containing the tools from my Graphviz / diagram related posts.
The repository can be found at: http://github.com/vidarh/diagram-tools/tree/master

These are the ones included and the appropriate articles:

The only new thing so far is that notugly.xsl is updated to work with Graphviz 2.22.2





2009-02-04 02:46 UTC Simple charts in Ruby using SVG::Graph

One thing that comes up time and time again when I mess around with a system, is quickly looking at frequencies of various things - for example disk usage by sub-directory, or referrer entries in my Apache access log.  Like this:http://librsvg.sourceforge.net/
# cat /var/log/httpd/access_log | cut -d' ' -f11 | grep -v '"-"' | grep -v hokstad.com | sort | uniq -c | sort -rn | head -n 10
 74 "http://www.google.com/reader/view/"
 42 "http://www.rubyflow.com/items/1606"
 35 "http://www.dzone.com/links/creating_graphviz_graphs_from_ruby_arrays.html"
 27 "http://www.dzone.com/links/rss/creating_graphviz_graphs_from_ruby_arrays.html"
 20 "http://www.reddit.com/r/ruby/"
 12 "http://www.reddit.com/r/ruby/comments/7tw1a/creating_graphviz_graphs_from_ruby_arrays/"
 9 "http://www.graphviz.org/Resources.php"
 8 "http://www.netvibes.com/"
 8 "http://www.google.com/reader/view/#overview-page"
 8 "http://www.google.com/notebook/fullpage"

Especially the "sort | uniq -c | sort -rn | head -n something" is a very frequently recurring pattern, in order to get a list of something in descending order of frequency.

But I'm difficult. I want something visual; a chart. Something like this:


Incidentally, there's a nice Ruby package called SVG::Graph that allows you to generate SVG's from this. The above is a PNG for best compatibility, but here is the SVG version - at least Firefox renders it better than rsvg which I used to generate the PNG.
Assuming you install SVG::Graph from the above page, it's pretty simple to generate the SVG's - just pipe the output from the above command straight into this script:

require 'SVG/Graph/BarHorizontal'

data = [] fields = [] ARGF.each do |line| line = line.chomp.split data << line[0].to_i fields << line[1] end

graph = SVG::Graph::BarHorizontal.new(:height => 20 * data.size, :width => 800, :fields => fields.reverse)

graph.add_data(:data => data.reverse) graph.rotate_y_labels = false graph.scale_integers = true graph.key = false print graph.burn


It tries to be reasonably intelligent about adjusting the height of the graph, but you might want to adjust the width and other parameters. My goal was to get something simple that'll "just work" when I pipe the output of "uniq -c" into it, but any space separated data with a number first and the label afterwards should work as long as the data set is small enough that the graph it produces doesn't become ridiculously large. I'll probably extend it with some command line options, and I want to change the styles used to set colors etc., but the above is enough for 90% of what I want. Note that SVG::Graph also support pie charts, line charts etc.


2009-02-01 02:07 UTC Creating Graphviz graphs from Ruby arrays

As part of my compiler project I wanted a way to visualize the programs, and since the syntax tree (so far at least) is represented with plain Ruby arrays I decided to throw together a script to use Graphviz to generate some graphs. I've written about using Graphviz previously here
The code does not make any assumptions tied to my compiler, but it's NOT an attempt at visualizing arbitrary object structures. You need to pass it an Array object, which can contain other arrays or objects that responds to #to_s.
NOTE: The code makes NO attempt to deal with structures that have loops - you'll run out of stack space soon enough if you try that. Feel free to post your fixes to do that in the comments (easy enough - just need to keep track of visited objects).

An example example. Given this:

 [:defun, :parse_quoted, [:c],
 [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 34]], [:do,
  [:putchar, :c]
  ]
 ]
 ]


I generate this image (the gradients and shadows are thanks to my previously described XSL transform to pretty up the Graphviz SVG output):



The code is quite straightforward, though I'm not quite happy with the amount of monkey-patching. There were two easy choices: monkey-patching or lots of #is_a? calls, which made it horribly messy. I wouldn't advocate including this into a larger app without cleaning it up first, but as a quick hack it works well.

module ToDot
 def self.escape str
  str.gsub(/([<>{} |\])/) { "\"+$1 }
 end
end

class String def to_dot_label; '\"'+ToDot::escape(self)+'\"'; end end

class Array def to_dot_label; "..."; end

def to_dot_edge src, shorten " #{src}" + (shorten ? "" : ":#{object_id}") + " -> #{object_id};\n" end

def to_dot_subgraph return "" if nil ary = self[0].is_a?(Array) shorten = !ary && self[1..-1].detect{|o| !o.is_a?(Array)} == nil s = " #{object_id} [label=\"" if shorten s += self[0].to_dot_label + "\", shape=rect];\n" else s += collect { |o| "<#{o.object_id}> " + o.to_dot_label }.join("|") s += "\"];\n" end s += collect {|o| o.to_dot_edge(object_id,shorten) }.join s += collect {|o| o.to_dot_subgraph }.join s end end

class Object def to_dot_subgraph; end def to_dot_edge src, shorten; end def to_dot_label; ToDot::escape(to_s); end

def to_dot s = "digraph G {\n" s += " node [shape=record style=filled fillcolor=lightblue " s += "fontname=Verdana height=0.05 fontsize=10.0 ];\n" s += to_a.to_dot_subgraph s += "}\n" end end


As for how to use it:

require 'arytodot'

puts someArray.to_dot

Then pipe the output to "dot -Tsvg >output file" and use your favorite XSL processor to render an image from it. I used "rsvg file.svg file.png". If you want to use my XSL transform to pretty it up, follow the instructions in the article linked to above.

Here's the full "parser" example from my compiler series (click for the full size version):




2008-06-11 00:09 UTC 5 simple ways to troubleshoot using Strace

Posted in: , , ,
I keep being surprised how few people are aware of all the things they can use strace for. It's always one of the first debug tools I pull out, because it's usually available on the Linux systems I run, and it can be used to troubleshoot such a wide variety of problems.

What is strace?

Strace is quite simply a tool that traces the execution of system calls. In its simplest form it can trace the execution of a binary from start to end, and output a line of text with the name of the system call, the arguments and the return value for every system call over the lifetime of the process.

But it can do a lot more:

  • It can filter based on the specific system call or groups of system calls
  • It can profile the use of system calls by tallying up the number of times a specific system call is used, and the time taken, and the number of successes and errors.
  • It traces signals sent to the process.
  • It can attach to any running process by pid.

If you've used other Unix systems, this is similar to "truss". Another (much more comprehensive) is Sun's Dtrace.

How to use it

This is just scratching the surface, and in no particular order of importance:

1) Find out which config files a program reads on startup

Ever tried figuring out why some program doesn't read the config file you thought it should? Had to wrestle with custom compiled or distro-specific binaries that read their config from what you consider the "wrong" location?

The naive approach:

$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {st_mode=S_IFREG|0664, st_size=40971, ...}) = 0

So this version of PHP reads php.ini from /usr/local/lib/php.ini (but it tries /usr/local/bin first).

The more sophisticated approach if I only care about a specific syscall:

$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4

The same approach work for a lot of other things. Have multiple versions of a library installed at different paths and wonder exactly which actually gets loaded? etc.

2) Why does this program not open my file?

Ever run into a program that silently refuse to read a file it doesn't have read access to, but you only figured out after swearing for ages because you thought it didn't actually find the file? Well, you already know what to do:

$ strace -e open,access 2>&1 | grep your-filename

Look for an open() or access() syscall that fails

3) What is that process doing RIGHT NOW?

Ever had a process suddenly hog lots of CPU? Or had a process seem to be hanging?

Then you find the pid, and do this:

root@dev:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL 
Process 15427 detached

Ah. So in this case it's hanging in a call to futex(). Incidentally in this case it doesn't tell us all that much - hanging on a futex can be caused by a lot of things (a futex is a locking mechanism in the Linux kernel). The above is from a normally working but idle Apache child process that's just waiting to be handed a request.

But "strace -p" is highly useful because it removes a lot of guesswork, and often removes the need for restarting an app with more extensive logging (or even recompile it).

4) What is taking time?

You can always recompile an app with profiling turned on, and for accurate information, especially about what parts of your own code that is taking time that is what you should do. But often it is tremendously useful to be able to just quickly attach strace to a process to see what it's currently spending time on, especially to diagnose problems. Is that 90% CPU use because it's actually doing real work, or is something spinning out of control.

Here's what you do:

root@dev:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.59    0.001014          48        21           select
  2.89    0.000031           1        21           getppid
  2.52    0.000027           1        21           time
------ ----------- ----------- --------- --------- ----------------
100.00    0.001072                    63           total
root@dev:~# 

After you've started strace with -c -p you just wait for as long as you care to, and then exit with ctrl-c. Strace will spit out profiling data as above.

In this case, it's an idle Postgres "postmaster" process that's spending most of it's time quietly waiting in select(). In this case it's calling getppid() and time() in between each select() call, which is a fairly standard event loop.

You can also run this "start to finish", here with "ls":

root@dev:~# strace -c >/dev/null ls
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.62    0.000205         103         2           getdents64
 18.78    0.000163          15        11         1 open
 15.09    0.000131          19         7           read
 12.79    0.000111           7        16           old_mmap
  7.03    0.000061           6        11           close
  4.84    0.000042          11         4           munmap
  4.84    0.000042          11         4           mmap2
  4.03    0.000035           6         6         6 access
  3.80    0.000033           3        11           fstat64
  1.38    0.000012           3         4           brk
  0.92    0.000008           3         3         3 ioctl
  0.69    0.000006           6         1           uname
  0.58    0.000005           5         1           set_thread_area
  0.35    0.000003           3         1           write
  0.35    0.000003           3         1           rt_sigaction
  0.35    0.000003           3         1           fcntl64
  0.23    0.000002           2         1           getrlimit
  0.23    0.000002           2         1           set_tid_address
  0.12    0.000001           1         1           rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00    0.000868                    87        10 total

Pretty much what you'd expect, it spents most of it's time in two calls to read the directory entries (only two since it was run on a small directory).

5) Why the **** can't I connect to that server?

Debugging why some process isn't connecting to a remote server can be exceedingly frustrating. DNS can fail, connect can hang, the server might send something unexpected back etc. You can use tcpdump to analyze a lot of that, and that too is a very nice tool, but a lot of the time strace will give you less chatter, simply because it will only ever return data related to the syscalls generated by "your" process. If you're trying to figure out what one of hundreds of running processes connecting to the same database server does for example (where picking out the right connection with tcpdump is a nightmare), strace makes life a lot easier.

This is an example of a trace of "nc" connecting to www.news.com on port 80 without any problems:

$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 153
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("216.239.122.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL)        = 1 (out [3])

So what happens here?

Notice the connection attempts to /var/run/nscd/socket? They mean nc first tries to connect to NSCD - the Name Service Cache Daemon - which is usually used in setups that rely on NIS, YP, LDAP or similar directory protocols for name lookups. In this case the connects fails.

It then moves on to DNS (DNS is port 53, hence the "sin_port=htons(53)" in the following connect. You can see it then does a "sendto()" call, sending a DNS packet that contains www.news.com. It then reads back a packet. For whatever reason it tries three times, the last with a slightly different request. My best guess why in this case is that www.news.com is a CNAME (an "alias"), and the multiple requests may just be an artifact of how nc deals with that.

Then in the end, it finally issues a connect() to the IP it found. Notice it returns EINPROGRESS. That means the connect was non-blocking - nc wants to go on processing. It then calls select(), which succeeds when the connection was successful.

Try adding "read" and "write" to the list of syscalls given to strace and enter a string when connected, and you'll get something like this:

read(0, "test\\n", 1024)                 = 5
write(3, "test\\n", 5)                   = 5
poll([{fd=3, events=POLLIN, revents=POLLIN}, {fd=0, events=POLLIN}], 2, -1) = 1
read(3, "

This shows it reading "test" + linefeed from standard in, and writing it back out to the network connection, then calling poll() to wait for a reply, reading the reply from the network connection and writing it to standard out. Everything seems to be working right.

Other ideas?

I'd love to hear from you if you use strace in particularly creative ways. E-mail me (vidar@hokstad.com) or post comments.



About me

E-mail: vidar@hokstad.com Skype: vhokstad
Twitter: vhokstad
View my LinkedIn profile.

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Twitter Updates

    follow me on Twitter