Customizing the Ruby syntax highlighter for x86 assembler 2008-04-30


I wrote about syntax hightlighting in Ruby earlier. The Ruby Syntax library supports Ruby, YAML and XML out of the box. But it's also pretty easy to extend to handle other languages. Since I've been writing my compiler in Ruby series and including a lot of x86 assembler, I figured I'd see how much (or little) work adding a syntax highlighter for assembler would take. It's by no means perfect - I've only spent half an hour or so throwing this together, but it's reasonable, and easy enough to keep adjusting. Here it is:
require 'rubygems'
require 'syntax'

class AsmTokenizer < Syntax::Tokenizer
  def setup
    @state = :newline
  end

  def step
    @state = :newline if bol?
    if @state == :newline
      # Handle labels and operands
      if label = scan(/[a-zA-Z.][a-zA-z0-9_]*:/) then start_group :label, label
      elsif words = scan(/\.[a-zA-Z0-9_]*/)
        start_group :directive, words
        @state = :operands
      elsif words = scan(/[a-zA-Z]+/)
        start_group :operator, words
        @state = :operands
      else start_group(:normal, getch)
      end
    else
      # Handle operators and assorted punctuation

      if words = scan(/,/)                then start_group :comma, words
      elsif words = scan(/[\-0-9][0-9]*/) then start_group :number, words
      elsif words = scan(/%[a-zA-Z]+/)    then start_group :register, words
      elsif words = scan(/[\.a-zA-Z][a-zA-Z0-9]*/) then start_group :label, words
      elsif words = scan(/\$/)     then start_group :value, words
      elsif words = scan(/[\(\)]/) then start_group :paren, words
      elsif words = scan(/\".*\"/) then start_group :quoted, words
      else start_group(:normal, getch)
      end
    end
  end
end

# Register the custom highlighter
Syntax::SYNTAX['asm'] = AsmTokenizer
I don't have time do write a lot of explanation - the Syntax manual does a reasonable job of describing it. The one pitfall to be aware of, is that you must make sure that your #step method advances at least one character no matter what, or you'll get stuck in an infinite loop. Here's an example of how to use the new highlighter:
require 'syntax/convertors/html'

convertor = Syntax::Convertors::HTML.for_syntax "asm"
puts convertor.convert( File.read("/tmp/step5.s"))
And here's some CSS to color the output:
body { background: black; }                                                                                                          
.directive { color: purple; }                                                                                                        
.comma     { color: white; }                                                                                                         
.paren     { color: white; }                                                                                                         
.value     { color: white; }                                                                                                         
.number    { color: yellow; }                                                                                                        
.label     { color: blue; }                                                                                                          
.register  { color: brown; }                                                                                                         
.operator  { color: lightgrey; }                                                                                                     
.quoted    { color: green; }                                                                                                         
And some example output from the compiler series:
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4,%esp
        movl    $.LC0,(%esp)
        call    puts
        addl    $4, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .section        .rodata
.LC0:
        .string "Hello World"

blog comments powered by Disqus