Language Agnostic: November 2009

Monday, November 16, 2009

Litexte

Litexte is a Textile parser which i wrote to see how far i can go with Regex patterns and to exercise my lazy right brain. In this blogpost ill be illustrating how to create a parser in Ruby for Textile.

Textile is a light-weight markup language like Markdown. RedCloth is a well-known library for parsing Textile markup in Ruby. What's the fun in using an existing library? Lets weave our very own Textile parser with lots of Regex awesomeness just for fun.

Checkout this Textile quick reference written by _why. Click here for a sample textile input which will be used to build the Parser

Textile markup can be categorized and parsed in the following order:

1. Headers and Blockquotes

Headers are represented with a begin marker, followed by a . and content.


input.gsub!(/h1\.(.*?)\n/,'<h1>\1</h1>')
input.gsub!(/h2\.(.*?)\n/,'<h2>\1</h2>')
input.gsub!(/h3\.(.*?)\n/,'<h3>\1</h3>')
input.gsub!(/bq\.(.*?)\n/,'<blockquote>\1</blockquote>')

2. Delimited tags

Delimited markups have a begin and end marker with the content in the middle. Most of Textile markups fall in this category.


input.gsub!(/\_{2}(.*?)\_{2}/,'<i>\1</i>')
input.gsub!(/\*{2}(.*?)\*{2}/,'<b>\1</b>')    
input.gsub!(/\?{2}(.*?)\?{2}/,'<cite>\1</cite>')    
input.gsub!(/\_{1}(.*?)\_{1}/,'<em>\1</em>')
input.gsub!(/\*{1}(.+?)\*{1}/,'<strong>\1</strong>')
input.gsub!(/\-{1}(.*?)\-{1}/,'<del>\1</del>')
input.gsub!(/\+{1}(.*?)\+{1}/,'<ins>\1</ins>')
input.gsub!(/\^{1}(.*?)\^{1}/,'<sup>\1</sup>')
input.gsub!(/\~{1}(.*?)\~{1}/,'<sub>\1</sub>')

3. Links, Images , Superscript, Subscript

Links, Images, Superscript, Subscript tags etc dont follow symmetric patterns, but can still be parsed easily with simple regexes


input.gsub!(/"(\w+)":(\S+)/,'<a href="\2">\1</a>')
input.gsub!(/\!{1}(.*?)\!{1}/,'<img src="\1"/>')

4. Span and P tags

Span and P tags are a little tricky because there are several variants with classes, ids, style etc like:


%Ruby is awesome%
Ruby is awesome

%{color:blue}Regex is awesome%
Regex is awesome

Ruby Regexes can be used with blocks, which is a very powerful feature for this kind of conditional substitution. It suits both a simple span tag and p tag with numerous variants. Check out the below regex substitution with blocks for span and p:


input.gsub!(/\%{1}(\{(.*)\})?(.*?)\%{1}/) do
  style = $1 ? " style=\"#{$2}\" " : ''
  "<span#{style}>#{$3}</span>"
end


input.gsub!(/p([\<\>\=]+)?(\((.*)\))?(\{(.*)\})?(\[(.*)\])?\.(.*?)\n/) do
  aligns = {'<' => 'left', '>' => 'right', '=' => 'center', '<>' => 'justify'}
  align = $1 ? " text-align: #{aligns[$1]};" : ""
  styles = $5 || ""
  style = (align + styles).empty? ? "" : " style=\"#{align}#{styles}\""
  lang = $7 ? " lang=\"#{$7}\"" : ""
  text = $8
  mdata = $3 ? $3.match(/(\w+)?#?(\w+)?/) : []
  _class = mdata[1] ? " class=\"#{mdata[1]}\"" : ""
  _id = mdata[2] ? " id=\"#{mdata[2]}\"" : ""
  "<p#{_id}#{_class}#{style}#{lang}>#{text}</p>"
end

5. Tables

|_. Name |_. Age |
|John    |20     |
|Bill    |25     |

For Tables you need a way to block substitute each table in the markup incase there are multiple tables. Otherwise its very straightforward:


def parse_table(table)
  header = /^\_\./
  out = "<table>"
  table.each do |row|
    out += "<tr>"
    row.split('|').reject {|t| t.chomp.empty?}.each do |cell|
      if cell =~ header
        out += "<th>#{cell.sub(header,'')}</th>"
      else
        out += "<td>#{cell}</td>"
      end
    end
    out += "</tr>"
  end
  out += "</table>"
end

6. Ordered and Unordered Lists

# Languages
## Ruby
## Python
# Frameworks
## Rails

Lists in addition to being multiple can also be nested, which requires a recursive solution, to either spit out the list content or to parse a sublist. The solution is not follow Ruby idioms because each and sublisting don't work very well.


def parse_list(list)
  items = list.scan(/^#+.*?\n/).map(&:chomp).collect {|item| item =~ /(#+)(.*)/; [$1,$2]}
  parse_list_items(items, symbol) 
end

def parse_list_items(items, start = 0)
  list_out = "<ol>"
  i = 0
  while(i < items.length)
    level, item = items[i]
    if level.length-start == 1
      list_out += "<li>#{item}</li>"
      i += 1
    else
      j = i + (items[i,items.size].find_index {|e| e[0].length == start+1} || items.length)
      list_out += parse_list_items(items[i,j-1], start+1)
      i += (j-1)
    end
  end
  list_out += "</ol>"
end

Sunday, November 15, 2009

Building XMLs with the magic of method_missing in Ruby

"Any sufficiently advanced technology is indistinguishable from magic."
- Arthur C Clarke

"Any sufficiently advanced technology which you don't understand is magic."
- Reddit comments

Method missing is an elegant dynamic programming trick. The best use of it is in the dynamic finders in ActiveRecord. Another simple but awesome library which uses the same trick is the XML builder library. This blogpost illustrates the use of method missing by building a rudimentary XML builder library. I haven't checked out XML builder source code to keep it simple and authentic.

The following snippet illustrates usage of this builder. To keep it simple i'm skipping XML attributes, comments and DTDs:


require 'buildr'
xml = Buildr.new
puts xml.phonebook {
  xml.contact {
    xml.full_name 'John Doe'
    xml.email 'john.doe@gmail.com'
    xml.phone '121-101'
  }
  xml.contact {
    xml.full_name 'William Smith'
    xml.email 'william.smith@gmail.com'
    xml.phone '121-102'
  }
}

And this is the XML output expected from the builder:


<phonebook>
  <contact>
    <full_name>John Doe</full_name>
    <email>john.doe@gmail.com</email>
    <phone>121-101</phone>
  </contact>
  <contact>
    <full_name>William Smith</full_name>
    <email>william.smith@gmail.com</email>
    <phone>121-102</phone>
  </contact>
</phonebook>

Looking at the usage, its obvious that the Buildr uses method missing to interpret missing methods as valid tags. Another pattern is the usage of blocks for nesting tags. Let's get started with a Builder class which implements method_missing to dynamically render XML tags:


class Buildr
  def method_missing(tag,*args,&block)
    content = args.empty? ? yield : args.first
    render(tag,content)
  end

  def render(tag, content)
    buffer = ""
    buffer += "<#{tag}>"
    buffer += content
    buffer += "</ #{tag}>"
  end
end

render method creates opening and closing tags and puts text or further evaluation of nested tags between them. That's the essence of what we need in the succinct implementation above. Let's run it:


<phonebook><contact><phone>121-102</phone></contact></phonebook>

Boink! That's predictable for a first cut. But here's what went wrong. yield returns the value of the last statement in the block, but what we need is an aggregate of all the xml outputs in a block. That's why the output contains only the last phone number of the last Contact.

The fix was elusive, but what we need here is some way to aggregate the outputs of each method_missing call and return that as the output of the block. I fixed it by adding a buffer (instance_variable) to aggregate xml outputs in a block and resetting the buffer for each block.


class Buildr
  def initialize
    @buffer = ""
  end

  def method_missing(tag,*args,&block)
    render(tag) do
      unless args.empty?
        args.first
      else
        @buffer = ""
        output = yield
        output
      end
    end
  end

  def render(tag, &content)
    @buffer += "<#{tag}>"
    @buffer += yield
    @buffer += "</ #{tag}>"
  end
end

render method now takes a block, which returns text or evaluates nested blocks. The block also takes care of resetting buffer. Now let's run it:


<phonebook>
<contact><full_name>John Doe</full_name><email>john.doe@gmail.com</email><phone>121-101</phone></contact>
<contact><full_name>William Smith</full_name><email>william.smith@gmail.com</email><phone>121-102</phone></contact>
</phonebook>

That's awesome. It works, but its not formatted and uses instance variable state which could have been avoided.

PS: This experimental Buildr is hosted at Github

Aspect oriented blocks

Programmers being lazy want to avoid redundant code. An example is the boilerplate code written over and over again whenever you access a file or a database connection: Opening the file, Reading/Writing from the filestream and then do some house-keeping work to make sure the file is properly closed.

Look at the following example of writing a file in Java:


BufferedWriter out;
try{
  out = new BufferedWriter(new FileWriter("out.file"));
  out.write("stuff");

}catch(IOException e) {
  logger.error("Error opening file: " + e);

}finally{
  out.close();
}

This is why i never feel comfortable writing a one-off file program in Java. Sure, you can abstract this in a function called writeToFile(filename,contents) and re-use it. But every java programmer in the world has to write this atleast once. If i were to write a standard library API, i would never want my users to suffer.

This is a hard problem to abstract especially because you want to do something before writing to a file, do some stuff after writing. This is called around advice (before+after) in Aspect-oriented programming. If you have looked at the usability of AOP in Java, you'd rather repeat code. This is where blocks come to the rescue in Ruby. Look at the same example in Ruby.


File.open('out.file','w') do |f|
  f.write 'stuff'
end

The problem of opening, closing and reading/writing to stream has been abstracted once and for all, and as a programmer you just have to care about reading/writing. This is an elegant solution to the same problem.

Now how can you apply the same technique in your day-to-day Ruby programming. Let's say you're writing a cool desktop app in Ruby and it works in all platforms - Windows, Linux and MacOSX. Assume you're storing user preferences in different directories in different platforms and you want to unit test this behaviour. Let's say everytime you test a platform, you change PLATFORM constant, test the behavior and then reset it to it's original value.


describe('user preferences') do
  
  before do
    @app.start
  end

  it 'should be stored in MyDocuments in Windows' do
    original_platform = PLATFORM
    PLATFORM = 'Windows'
    
    @app.pref_file.location.should == "C:\\MyDocuments\\myapp.preferences"

    PLATFORM = original_platform 
  end

  it 'should be stored in ~/.myapp in Linux' do
    original_platform = PLATFORM
    PLATFORM = 'Linux'

    @app.pref_file.location.should == '~/.myapp'

    PLATFORM = original_platform 
  end

  it 'should be stored in Users/john/Preferences/myapp.plist in MacOSX' do
    original_platform = PLATFORM
    PLATFORM = 'MacOSX'

    @app.pref_file.location.should == '/Users/john/Preferences/myapp.plist'

    PLATFORM = original_platform 
  end

end

That's a lot of boilerplate code to switch PLATFORM, not to mention the numerous warnings you get in reassigning CONSTANTs. This can be elegantly solved using blocks.

The blocks provide little sandboxes in which your test code can execute with PLATFORM set to a specific value. Once you come out of the block, PLATFORM is reset back to it's original value.


describe('user preferences') do
  before do
    @app.start
  end

  it 'should be stored in MyDocuments in Windows' do
    os('Windows') do
      @app.pref_file.location.should == "C:\\MyDocuments\\myapp.preferences"
    end
  end

  it 'should be stored in ~/.myapp in Linux' do
    os('Linux') do
      @app.pref_file.location.should == '~/.myapp'
    end
  end

  it 'should be stored in Users/john/Preferences/myapp.plist in MacOSX' do
    os('MacOSX') do
      @app.pref_file.location.should == '/Users/john/Preferences/myapp.plist'
    end
  end

  def os(platform, &block)
    original_platform = PLATFORM
    PLATFORM = platform
    yield
    PLATFORM = original_platform
  end

end

Having the boilerplate code in one place, you can refactor it to remove and reassign Constants to eliminate warnings:


def os(platform, &block)
  original_platform = Object.send(:remove_const, 'PLATFORM')
  Object.const_set('PLATFORM', platform)
  yield
  Object.const_set('PLATFORM', original_platform)
end

Tim Toady

Tim Toady / TIMTOWTDI is a programming motto from Perl users. It stands for There Is More Than One Way To Do It. Ruby is inspired by Perl and also follows the same principle.

For example, consider the problem of looping. It may be hard to believe but there are 9 different ways to loop in Ruby.


100.times do
  puts "I will not throw paper airplanes in class"
end

1.upto(100) do |i|
  puts "#{i}. I will not throw paper airplanes in class"
end

for i in 1..100
  puts "#{i}. I will not throw paper airplanes in class"
end

Look at the different solutions for the famous 99 bottles of beer.


i = 99
while i >= 0 do
  bottles = "#{i.zero? ? 'No more' : i} bottles"
  bottles.chop! if i == 1
  puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
  puts "#{bottles} of beer on the wall, #{bottles} of beer."
  i -= 1
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."

i = 99
until i < 0 do
  bottles = "#{i.zero? ? 'No more' : i} bottles"
  bottles.chop! if i == 1
  puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
  puts "#{bottles} of beer on the wall, #{bottles} of beer."
  i -= 1
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."

i = 99
loop do
  bottles = "#{i.zero? ? 'No more' : i} bottles"
  bottles.chop! if i == 1
  puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
  puts "#{bottles} of beer on the wall, #{bottles} of beer."
  i -= 1
  break if i < 0
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."



There are several choices for even looping through the elements 
of an array.Consider dealing a deck of cards in a game. 



suits = ['♥','♠','♦','♣']
cards = 2..10.to_a + ['J','Q','K','A']
deck = suits.collect {|suit| cards.collect {|card| "#{card}#{suit}"}}.flatten.shuffle

deck.each do |card|
  deal card
end

for card in deck
  deal card
end

deck.each_with_index do |card,i|
  deal "Player#{i%4+1}", card
end

The number of choices gives the programmer the freedom to choose a looping construct based on the expressiveness for a particular problem.

Language Agnostic