Monday, November 16, 2009

Litexte



Litexte is a Textile parser which i wrote to see how far i can go with Regex patterns and to exercise my lazy right brain. In this blogpost ill be illustrating how to create a parser in Ruby for Textile.

Textile is a light-weight markup language like Markdown. RedCloth is a well-known library for parsing Textile markup in Ruby. What's the fun in using an existing library? Lets weave our very own Textile parser with lots of Regex awesomeness just for fun.

Checkout this Textile quick reference written by _why. Click here for a sample textile input which will be used to build the Parser

Textile markup can be categorized and parsed in the following order:

1. Headers and Blockquotes

Headers are represented with a begin marker, followed by a . and content.

input.gsub!(/h1\.(.*?)\n/,'<h1>\1</h1>')
input.gsub!(/h2\.(.*?)\n/,'<h2>\1</h2>')
input.gsub!(/h3\.(.*?)\n/,'<h3>\1</h3>')
input.gsub!(/bq\.(.*?)\n/,'<blockquote>\1</blockquote>')

2. Delimited tags

Delimited markups have a begin and end marker with the content in the middle. Most of Textile markups fall in this category.

input.gsub!(/\_{2}(.*?)\_{2}/,'<i>\1</i>')
input.gsub!(/\*{2}(.*?)\*{2}/,'<b>\1</b>')
input.gsub!(/\?{2}(.*?)\?{2}/,'<cite>\1</cite>')
input.gsub!(/\_{1}(.*?)\_{1}/,'<em>\1</em>')
input.gsub!(/\*{1}(.+?)\*{1}/,'<strong>\1</strong>')
input.gsub!(/\-{1}(.*?)\-{1}/,'<del>\1</del>')
input.gsub!(/\+{1}(.*?)\+{1}/,'<ins>\1</ins>')
input.gsub!(/\^{1}(.*?)\^{1}/,'<sup>\1</sup>')
input.gsub!(/\~{1}(.*?)\~{1}/,'<sub>\1</sub>')

3. Links, Images , Superscript, Subscript

Links, Images, Superscript, Subscript tags etc dont follow symmetric patterns, but can still be parsed easily with simple regexes

input.gsub!(/"(\w+)":(\S+)/,'<a href="\2">\1</a>')
input.gsub!(/\!{1}(.*?)\!{1}/,'<img src="\1"/>')

4. Span and P tags

Span and P tags are a little tricky because there are several variants with classes, ids, style etc like:

%Ruby is awesome%
Ruby is awesome

%{color:blue}Regex is awesome%
Regex is awesome


Ruby Regexes can be used with blocks, which is a very powerful feature for this kind of conditional substitution. It suits both a simple span tag and p tag with numerous variants. Check out the below regex substitution with blocks for span and p:


input.gsub!(/\%{1}(\{(.*)\})?(.*?)\%{1}/) do
style = $1 ? " style=\"#{$2}\" " : ''
"<span#{style}>#{$3}</span>"
end



input.gsub!(/p([\<\>\=]+)?(\((.*)\))?(\{(.*)\})?(\[(.*)\])?\.(.*?)\n/) do
aligns = {'<' => 'left', '>' => 'right', '=' => 'center', '<>' => 'justify'}
align = $1 ? " text-align: #{aligns[$1]};" : ""
styles = $5 || ""
style = (align + styles).empty? ? "" : " style=\"#{align}#{styles}\""
lang = $7 ? " lang=\"#{$7}\"" : ""
text = $8
mdata = $3 ? $3.match(/(\w+)?#?(\w+)?/) : []
_class = mdata[1] ? " class=\"#{mdata[1]}\"" : ""
_id = mdata[2] ? " id=\"#{mdata[2]}\"" : ""
"<p#{_id}#{_class}#{style}#{lang}>#{text}</p>"
end


5. Tables

|_. Name |_. Age |
|John |20 |
|Bill |25 |

For Tables you need a way to block substitute each table in the markup incase there are multiple tables. Otherwise its very straightforward:


def parse_table(table)
header = /^\_\./
out = "<table>"
table.each do |row|
out += "<tr>"
row.split('|').reject {|t| t.chomp.empty?}.each do |cell|
if cell =~ header
out += "<th>#{cell.sub(header,'')}</th>"
else
out += "<td>#{cell}</td>"
end
end
out += "</tr>"
end
out += "</table>"
end


6. Ordered and Unordered Lists

# Languages
## Ruby
## Python
# Frameworks
## Rails


Lists in addition to being multiple can also be nested, which requires a recursive solution, to either spit out the list content or to parse a sublist. The solution is not follow Ruby idioms because each and sublisting don't work very well.


def parse_list(list)
items = list.scan(/^#+.*?\n/).map(&:chomp).collect {|item| item =~ /(#+)(.*)/; [$1,$2]}
parse_list_items(items, symbol)
end

def parse_list_items(items, start = 0)
list_out = "<ol>"
i = 0
while(i < items.length)
level, item = items[i]
if level.length-start == 1
list_out += "<li>#{item}</li>"
i += 1
else
j = i + (items[i,items.size].find_index {|e| e[0].length == start+1} || items.length)
list_out += parse_list_items(items[i,j-1], start+1)
i += (j-1)
end
end
list_out += "</ol>"
end

Sunday, November 15, 2009

Building XMLs with the magic of method_missing in Ruby


"Any sufficiently advanced technology is indistinguishable from magic."
- Arthur C Clarke

"Any sufficiently advanced technology which you don't understand is magic."
- Reddit comments

Method missing is an elegant dynamic programming trick. The best use of it is in the dynamic finders in ActiveRecord. Another simple but awesome library which uses the same trick is the XML builder library. This blogpost illustrates the use of method missing by building a rudimentary XML builder library. I haven't checked out XML builder source code to keep it simple and authentic.

The following snippet illustrates usage of this builder. To keep it simple i'm skipping XML attributes, comments and DTDs:

require 'buildr'
xml = Buildr.new
puts xml.phonebook {
xml.contact {
xml.full_name 'John Doe'
xml.email 'john.doe@gmail.com'
xml.phone '121-101'
}
xml.contact {
xml.full_name 'William Smith'
xml.email 'william.smith@gmail.com'
xml.phone '121-102'
}
}


And this is the XML output expected from the builder:

<phonebook>
<contact>
<full_name>John Doe</full_name>
<email>john.doe@gmail.com</email>
<phone>121-101</phone>
</contact>
<contact>
<full_name>William Smith</full_name>
<email>william.smith@gmail.com</email>
<phone>121-102</phone>
</contact>
</phonebook>


Looking at the usage, its obvious that the Buildr uses method missing to interpret missing methods as valid tags. Another pattern is the usage of blocks for nesting tags. Let's get started with a Builder class which implements method_missing to dynamically render XML tags:

class Buildr
def method_missing(tag,*args,&block)
content = args.empty? ? yield : args.first
render(tag,content)
end

def render(tag, content)
buffer = ""
buffer += "<#{tag}>"
buffer += content
buffer += "</ #{tag}>"
end
end


render method creates opening and closing tags and puts text or further evaluation of nested tags between them. That's the essence of what we need in the succinct implementation above. Let's run it:

<phonebook><contact><phone>121-102</phone></contact></phonebook>


Boink! That's predictable for a first cut. But here's what went wrong. yield returns the value of the last statement in the block, but what we need is an aggregate of all the xml outputs in a block. That's why the output contains only the last phone number of the last Contact.

The fix was elusive, but what we need here is some way to aggregate the outputs of each method_missing call and return that as the output of the block. I fixed it by adding a buffer (instance_variable) to aggregate xml outputs in a block and resetting the buffer for each block.

class Buildr
def initialize
@buffer = ""
end

def method_missing(tag,*args,&block)
render(tag) do
unless args.empty?
args.first
else
@buffer = ""
output = yield
output
end
end
end

def render(tag, &content)
@buffer += "<#{tag}>"
@buffer += yield
@buffer += "</ #{tag}>"
end
end


render method now takes a block, which returns text or evaluates nested blocks. The block also takes care of resetting buffer. Now let's run it:

<phonebook>
<contact><full_name>John Doe</full_name><email>john.doe@gmail.com</email><phone>121-101</phone></contact>
<contact><full_name>William Smith</full_name><email>william.smith@gmail.com</email><phone>121-102</phone></contact>
</phonebook>


That's awesome. It works, but its not formatted and uses instance variable state which could have been avoided.

PS: This experimental Buildr is hosted at Github

Aspect oriented blocks

Programmers being lazy want to avoid redundant code. An example is the boilerplate code written over and over again whenever you access a file or a database connection: Opening the file, Reading/Writing from the filestream and then do some house-keeping work to make sure the file is properly closed.

Look at the following example of writing a file in Java:


BufferedWriter out;
try{
out = new BufferedWriter(new FileWriter("out.file"));
out.write("stuff");

}catch(IOException e) {
logger.error("Error opening file: " + e);

}finally{
out.close();
}


This is why i never feel comfortable writing a one-off file program in Java. Sure, you can abstract this in a function called writeToFile(filename,contents) and re-use it. But every java programmer in the world has to write this atleast once. If i were to write a standard library API, i would never want my users to suffer.

This is a hard problem to abstract especially because you want to do something before writing to a file, do some stuff after writing. This is called around advice (before+after) in Aspect-oriented programming. If you have looked at the usability of AOP in Java, you'd rather repeat code. This is where blocks come to the rescue in Ruby. Look at the same example in Ruby.

File.open('out.file','w') do |f|
f.write 'stuff'
end

The problem of opening, closing and reading/writing to stream has been abstracted once and for all, and as a programmer you just have to care about reading/writing. This is an elegant solution to the same problem.

Now how can you apply the same technique in your day-to-day Ruby programming. Let's say you're writing a cool desktop app in Ruby and it works in all platforms - Windows, Linux and MacOSX. Assume you're storing user preferences in different directories in different platforms and you want to unit test this behaviour. Let's say everytime you test a platform, you change PLATFORM constant, test the behavior and then reset it to it's original value.

describe('user preferences') do

before do
@app.start
end

it 'should be stored in MyDocuments in Windows' do
original_platform = PLATFORM
PLATFORM = 'Windows'

@app.pref_file.location.should == "C:\\MyDocuments\\myapp.preferences"

PLATFORM = original_platform
end

it 'should be stored in ~/.myapp in Linux' do
original_platform = PLATFORM
PLATFORM = 'Linux'

@app.pref_file.location.should == '~/.myapp'

PLATFORM = original_platform
end

it 'should be stored in Users/john/Preferences/myapp.plist in MacOSX' do
original_platform = PLATFORM
PLATFORM = 'MacOSX'

@app.pref_file.location.should == '/Users/john/Preferences/myapp.plist'

PLATFORM = original_platform
end

end

That's a lot of boilerplate code to switch PLATFORM, not to mention the numerous warnings you get in reassigning CONSTANTs. This can be elegantly solved using blocks.

The blocks provide little sandboxes in which your test code can execute with PLATFORM set to a specific value. Once you come out of the block, PLATFORM is reset back to it's original value.


describe('user preferences') do
before do
@app.start
end

it 'should be stored in MyDocuments in Windows' do
os('Windows') do
@app.pref_file.location.should == "C:\\MyDocuments\\myapp.preferences"
end
end

it 'should be stored in ~/.myapp in Linux' do
os('Linux') do
@app.pref_file.location.should == '~/.myapp'
end
end

it 'should be stored in Users/john/Preferences/myapp.plist in MacOSX' do
os('MacOSX') do
@app.pref_file.location.should == '/Users/john/Preferences/myapp.plist'
end
end

def os(platform, &block)
original_platform = PLATFORM
PLATFORM = platform
yield
PLATFORM = original_platform
end

end

Having the boilerplate code in one place, you can refactor it to remove and reassign Constants to eliminate warnings:

def os(platform, &block)
original_platform = Object.send(:remove_const, 'PLATFORM')
Object.const_set('PLATFORM', platform)
yield
Object.const_set('PLATFORM', original_platform)
end

Tim Toady

Tim Toady / TIMTOWTDI is a programming motto from Perl users. It stands for There Is More Than One Way To Do It. Ruby is inspired by Perl and also follows the same principle.

For example, consider the problem of looping. It may be hard to believe but there are 9 different ways to loop in Ruby.



100.times do
puts "I will not throw paper airplanes in class"
end

1.upto(100) do |i|
puts "#{i}. I will not throw paper airplanes in class"
end

for i in 1..100
puts "#{i}. I will not throw paper airplanes in class"
end


Look at the different solutions for the famous 99 bottles of beer.



i = 99
while i >= 0 do
bottles = "#{i.zero? ? 'No more' : i} bottles"
bottles.chop! if i == 1
puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
puts "#{bottles} of beer on the wall, #{bottles} of beer."
i -= 1
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."

i = 99
until i < 0 do
bottles = "#{i.zero? ? 'No more' : i} bottles"
bottles.chop! if i == 1
puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
puts "#{bottles} of beer on the wall, #{bottles} of beer."
i -= 1
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."

i = 99
loop do
bottles = "#{i.zero? ? 'No more' : i} bottles"
bottles.chop! if i == 1
puts "Take one down and pass it around, #{bottles} of beer on the wall.\n" unless i == 99
puts "#{bottles} of beer on the wall, #{bottles} of beer."
i -= 1
break if i < 0
end
puts "Go to the store and buy some more, 99 bottles of beer on the wall."




There are several choices for even looping through the elements
of an array.Consider dealing a deck of cards in a game.



suits = ['♥','♠','♦','♣']
cards = 2..10.to_a + ['J','Q','K','A']
deck = suits.collect {|suit| cards.collect {|card| "#{card}#{suit}"}}.flatten.shuffle

deck.each do |card|
deal card
end

for card in deck
deal card
end

deck.each_with_index do |card,i|
deal "Player#{i%4+1}", card
end


The number of choices gives the programmer the freedom to choose a looping construct based on the expressiveness for a particular problem.

Tuesday, July 28, 2009

Currying Curry functions in Javascript

Currying is a technique of transforming a function that takes multiple arguments into a partial application.

Here's an Example from Wikipedia:
f(x,y) = y / x.
g(y) = f(2,y) = y / 2
g(3) = 3 / 2.


At each partial application we end up with an executable function.

While reading about Currying in Functional programming for the rest of us, i came across this example:


pow(x,y) = x**y
square(x) = pow(x,2) = x**2


The idea of calling a function with partial arguments and getting back another function seemed interesting. I set out to implement it in, well Javascript, which is the only functional programming language i have ever used.


function pow(x,y){
return y == 0 ? 1 : x * pow(x,y-1);
}


square can be implemented in an imperative way by reusing pow:


function square(x){
return pow(x,2)
}


or by currying/partial application of pow:


var square = c(pow,2);
var cube = c(pow,3)
var sixteen = c(pow,2,4);


c is a function which allows currying, by remembering the curried args and returning a partial function which uses those arguments when calling the original function.


function c(f){
var cargs = Array.prototype.slice.call(arguments,1);
return function(){
Array.prototype.push.apply(arguments, cargs)
return f.apply(this, arguments);
}
}


The curried functions can be called like regular functions:


console.log(square(5));
console.log(sixteen())


Now if you notice, it works only if you want to partially apply args at the end. Consider another example, where you might want to apply curried args in the beginning:


function greet(message,name){
return message + ' ' + name;
}

function hello(name){
return greet('hello',name);
}

function goodbye(name){
return greet('goodbye',name);
}


Curried implementation of hello and goodbye would look like


var hello = c(greet,'hello');
var goodbye = c(greet,'goodbye');


c fn can only apply args at the end, so let's create another curry function which applies arguments at the beginning:


//curry args at the beginning
function _c(f){
var cargs = Array.prototype.slice.call(arguments,1);
return function(){
Array.prototype.push.apply(cargs, arguments)
return f.apply(this, cargs);
}
}

//curry args at the end
function c_(f){
var cargs = Array.prototype.slice.call(arguments,1);
return function(){
Array.prototype.push.apply(arguments, cargs)
return f.apply(this, arguments);
}
}

var square = c_(pow,2);
var cube = c_(pow,3)
var hello = _c(say,'hello');
var goodbye = _c(say,'goodbye');


c_ and _c is rather odd to denote the order of currying. Other than order, there is relatively no difference between the two functions. So let's refactor the curry functions with a higher order function:


function c(order){
return function(f){
var cargs = Array.prototype.slice.call(arguments,1);
return function(){
return f.apply(this, order(arguments,cargs));
}
}
}


c_ and _c can be obtained by currying c, a higher order function which takes order function as input and returns a curry function.


var _c = c(currybefore);
var c_ = c(curryafter);


currybefore and curryafter are functions which decide the order of currying:


function currybefore(args,curry_args){
return concat(curry_args,args);
}

function curryafter(args,curry_args){
return concat(args,curry_args);
}

function concat(list1,list2){
var result = [];
Array.prototype.push.apply(result,list1);
Array.prototype.push.apply(result,list2);
return result;
}


If you're tired after currying curry functions, go have some butter chicken curry:






* Array.prototype.push.apply(arguments, args) is a way to do pushall with push in javascript bcos apply splats array args
* Array.prototype.slice.call(arguments,1) is an idiom in javascript since arguments is not an actual array

Monday, May 25, 2009

Counting with Fingers










Primitive arithmetic operations +, -, *, /, ** are taken for granted in most programming languages. But a language need not necessarily support primitive arithmetic operations for it to be turing complete. Increment and decrement are the basic operations available in certain minimal turing complete languages like BF. How would you go about performing addition, multiplication, exponential in such a language with just Increment and decrement operations? It's certainly possible, but a short digression on the history of counting devices follows.

Counting was one of the primary reasons the computer was invented. There were numerous counting devices before the computer. The most primitive way was the tally marks found in egyptian cave drawings. Counting with fingers was also a common method as taught in primary schools. Chinese invented the abacus to aid in counting. Then came the mechanical and electronic calculators and computers.











How will you implement addition without +, multiplication without *, and exponentiation without ** ? Exponentiation is just repeated multiplication. Multiplication is just repeated addition and addition is just....well, repeated incrementation.

Think of the way you were taught addition in school. How did you add 2 and 3? You kept 2 fingers in one hand(left), 3 in the other (right).







You deduct one from left and add one to right.







You deduct one from left and add one to right.







Voila! You have no more fingers left in your left hand and the answer in your right hand. And all you needed to do was increment and decrement in alternate hands.

The only limitation is you have only five fingers in each hand. Even with the five fingers in each leg, it isn't much. Counting in this way isn't practical for larger number unless you have infinite fingers in each hand. But computers have a large memory (analogical fingers). And it is possible to add numbers using just increment and decrement as demonstrated previously.

An implementation of this in Scheme would look like this. [Scheme has primitive add1 and sub1 for increment and decrement]


(define (add n x)
(cond
[(zero? n) x]
[else (add1 (add (sub1 n) x))]
)
)


This is what happens if we use this function to add 5 and 8


(add 5 8)
= (1 + (add 4 8))
= (1 + (1 + (add 3 8)))
= (1 + (1 + (1 + (add 2 8))))
= (1 + (1 + (1 + (1 + (add 1 8)))))
= (1 + (1 + (1 + (1 + (1 + (add 0 8))))))
= (1 + (1 + (1 + (1 + (1 + 8)))))
= (1 + (1 + (1 + (1 + 9))))
= (1 + (1 + (1 + 10)))
= (1 + (1 + 11))
= (1 + 12)
= (13)


We have successfully used increment recursively to implement addition. Addition is a primitive recursive function. The same applies for Subtraction as well.

Now to implement multiplication without *, you can implement it recursively using + or add that we implemented just now:

(define (multiply n x)
(cond
[(zero? n) 0]
[else (add x (multiply (sub1 n) x))]
)
)

This is what happens if we use this function to multiply 4 and 5


(multiply 4 5)
= (5 + (multiply 3 5))
= (5 + (5 + (multiply 2 5)))
= (5 + (5 + (5 + (multiply 1 5))))
= (5 + (5 + (5 + (5 + (multiply 0 5)))))
= (5 + (5 + (5 + (5 + 0))))


...Before we reduce it further, we have a choice. We can abstract the addition as understood previously or delve deeper into how the addition is performed recursively.

= (5 + (5 + (5 + 5)))

= (5 + (5 + 10))

OR

(5 + (5 + (1 + (1 + (1 + (1 + (1 + 5)))))))

finally evaluating to

= 20

Now to implement exponent without **, you can implement it recursively using * or multiply that we implemented just now, which in turn is recursively implemented using + or add:


(define (exponent x n)
(cond
[(zero? n) 1]
[else (multiply x (exponent x (sub1 n)))]
)
)


This is what happens if we use this function to calculate 2**3


(exponent 2 3)
= (2 * (exponent 2 2))
= (2 * (2 * (exponent 2 1)))
= (2 * (2 * (2 * (exponent 2 0))))
= (2 * (2 * (2 * 1)))


The multiplications are evaluated recursively through addition, and the additions are evaluated recursively through increment.

Arithmetic operations can be computed recursively was a re-revelation to me on reading 'how to design programs' long after forgetting how to count with fingers.

Brainf****d

I got brainf****d on a Saturday afternoon. No i didn't get admitted for psychological reasons. Brainf*** is an esoteric programming language. Its a turing complete language with as few as 8 instructions:

> Increment the pointer
< Decrement the pointer
+ Increment the byte at the pointer
- Decrement the byte at the pointer
. Output the byte at the pointer
, Input a byte and store it in the byte at the pointer
[ Jump forward past the matching ] if the byte at the pointer is zero
] Jump backward to the matching [ unless the byte at the pointer is zero

Click here for a brief description of the language. The wikipedia entry is more detailed spoiling the fun in discovering the quirks of the language.

I started writing an interpreter for BF in Ruby, which was a simple switch. But the interesting part is actually writing code in BF. Its much more challenging than Assembly language because of the instruction set and using a tape like memory without any registers.

Here is the BF interpreter in Ruby:
if ARGV.length != 1
puts "Usage: ruby brainfuck.rb filename"
exit
end
mem = Array.new(30000,0x00)
loopstack = Array.new
p = 0

code = File.open(ARGV[0]).readlines.join("")

i=0
while i < code.length
case code[i]
when '>'
p += 1
when '<'
p -= 1
when '+'
mem[p] += 1
when '-'
mem[p] -= 1
when '.'
print mem[p].chr
when ','
mem[p] = STDIN.getc
when '['
if mem[p].zero?
i = code.index(']',i)
else
loopstack.push i
end
when ']'
if(mem[p].zero?)
loopstack.pop
else
i = loopstack.last
end
end
i+=1
end

And here is the traditional HelloWorld program in BF:

#H
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++
.
>
#e
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
+
.
>
#l
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++
.
>
#l
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++
.
>
#o
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++
.
>
#SPACE
++++++++++++++++++++++++++++++++
.
>
#w
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++
.
>
#o
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++
.
>
#r
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++
.
>
#l
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++
.
>
#d
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++
.


So long for HelloWorld? This would probably be the worst language for beginners!

The only work in the Interpreter was managing loops expressed using [ and ]. I ended up using a stack to manage loops similar to the way stacks are used in procedural languages to transfer control to methods and return back.

Here is a program to iterate from 0 to 9 in BF:

#ASCII value of 0
++++++++++++++++++++++++++++++++++++++++++++++++
>
#Loop counter variable
++++++++++
[
< . + >
-
]

Self generating code (or Why Recursion is a wonderful idea)

Self-modifying code is the fancy of every assembly language enthusiast. You can write self modifying code when code and data are in the same accessible area of memory, and your code can modify the instructions of itself. I have never written self-modifying code myself, but have always admired the possibilities of it.

Self-generating code is possible in dynamic languages like Ruby and Javascript through the eval statement. It's commonly referred to as metaprogramming in Ruby. I recently came across a very common problem that made me think of a metaprogramming solution. Imagine you have a set [A, B, C, D]. How do you find all the different combinations of it - [A], [B], [C], [D], [A,B], [A,C], [A,D], [B,C]......[A,B,C,D]?

Lets break down the problem. To find all different combinations, you need to find the combinations selecting 1 at a time, selecting 2 at a time and so on:
C = nC1 + nC2 + .... + nCn

To find the combination nCr, you can find the permutation and take the unique sets:

To find the permutation nPr (n=4, r=3), a simple pseudo code would look like

permutations= []
set = ['A','B','C','D']
for i in 0 to n-1
for j in 0 to n-1
for k in 0 to n-1
permutations << [set[i], set[j], set[k]]
end
end
end


It has n-levels of nested loops and its difficult to write n levels of nesting in code when n is dynamic. You can't write code for a fixed-n. The code must be intelligent enough to generate n-level nested loops (or you can recur, but lets hold on to a non-recursive solution). This can be expressed using Ruby each as:


set.each {|i1|
set.each {|i2|
set.each {|i3|
permutations << [i1, i2, i3]
}
}
}


Metaprogramming, like Recursion, is all about finding patterns, and you can easily spot code segments repeated thrice for r = 3. This can be expressed using eval as:


def permutate(set, r)
permutations = []
eval (1..r).collect {|i| "set.each{|i#{i}| "}.join('') +
'permutations << [' + (1..r).collect{|i| "i#{i}"}.join(',') + ']' + '}' * r
permutations
end


This code has a flaw because it included duplicates in each iteration. It can be fixed by not considering the elements of preceding iterations in the set for the current iteration. The modified code is:


set.each {|i1|
(set - [i1]).each {|i2|
(set - [i1] -[i2]).each {|i3|
permutations << [i1, i2, i3]
}
}
}


In eval, this translates to:


def permutate(set, r)
permutations = []
eval (1..r).collect {|i| "(set " + (1...i).collect {|j| " - [i#{j}]"}.join('') +").each{i#{i} "}.join('') +
'permutations << [' + (1..r).collect{|i| "i#{i}"}.join(',') + ']' + '}' * r permutation
end


Combination can be expressed using permutation as:


def combinate(set,n)
permutate(set_n).collect{|s| s.sort}.to_set.to_a
end


Now, the original problem can be solved by computing the combinations from 1 to n:


def combinations(set)
(1..set.size).collect {|i| combinate(set,i)}
end


It was a tricky use of eval and looping to generate code on the fly. But why bother with tricky syntactic details and code generation, when the compiler/interpreter can manage all the grunt work for you. Thats why Recursion is such a wonderful idea

Thursday, May 14, 2009

Schrodinger's Cat of Programming (or My understanding of Godel's Theorem)

Little Schemer deals with a few mathematical theorems through simple examples. It has an easily understandable explanation of Godel's incompleteness theorem. Eventhough the book primarily focusses on recursion, at one point it addresses the problem of infinite recursion. It happens only for partial functions for which you cant have a definite exit criteria.

Consider the list (6 2 4 jerry 5 7 3) and the atom jerry. If your objective is to locate jerry by tracing from the first location, you will traverse like this:
[1] -> [6] -> [7] -> [3] -> [4] -> jerry

Its not always guaranteed to end. Now lets try another:
Consider the list (7 1 2 jerry 5 6 3)
[1] -> [7] -> [3] -> [2] -> [1] -> [7] -> [3] -> [2] -> [1] -> [7].........

Wait, this will never end. Tom will never find Jerry if he uses your function. Such functions are called partial functions and functions which always return a value like factorial are called total functions.

Now let's say you want to write a function that takes a function and determines if the function will stop execution or not. In functional programming languages, you can pass functions as first class variables. It'll look like this:

(define will-stop?
(lambda (fun)
...))

Let's assume that such a function is possible (Assume is a revealing word from mathematical induction and i guess i might have spoiled the suspense already)

Now we have a function to test. Lets call it try

(define try
(lambda(x)
...))

Now just for fun, lets write a function, which is guaranteed to never stop. Let's call it wont-stop

(define wont-stop
(lambda (x)
(wont-stop x)))

Now before passing try to will-stop?, lets implement try

(define try
(lambda (x)
(and
(will-stop? try)
(wont-stop x)
)))

Lets assume that will-stop works for try and returns true, which means try will stop. try evaluates to

(and true (wont-stop x))

which wont stop. Now that contradicts with the assumed output of will-stop?

Now lets assume that will-stop works for try and returns false, which means try will never stop. try evaluates to

(and false (wont-stop x))

which returns false. Effectively try has stopped when the assumed will-stop? predicted try will never stop.

Now the only possible explanation of this anomaly or paradox is that will-stop? can never be implemented.(Connect the dots because i'm still connecting).This is the essence of Godel's incompleteness theorem.

One book which is gathering dust in my bookshelf and i hope will have a better explanation of Godel's theorem is Godel, Escher and Bach. Another series of books which i have heard always culminates in an explanation of Godel's theorem using logical unsolvable puzzles are Raymond Smullyan's puzzle books. Until i get any of Smullyan's books, i found an explanation of Godel's theorem for dummies here.

Schrodinger's cat is a thought experiment in quantum mechanics which is equally bizzare. Imagine that we have a cat in a box and a GM counter which is triggered by an atom which may or may not decay. GM counter controls a hammer which shatters a flask of cyanide which might kill the cat. Now if you come back after an hour, the cat in the box may be alive or dead or both and you would never know until you actually open the box. The cat is in a half-dead,half-alive state and continues to have a superposition of multiple histories/states until the box is opened, when it becomes either dead or alive.

Back to our previous finding, For try, will-stop? wont stop if we assume will-stop? (try) will stop and will-stop? will stop if we assume will-stop? (try) wont stop.Now thats a paradoxical statement which makes Schrodinger's cat sound like simple probability.

Little Schemer (Review)




I have been reading the Little Schemer recently. Dan Friedman uses the Socratic style of teaching (Q&A style if you're not interested in Greek history), in which you learn by asking questions. It may sound like School, but its the simplest way to teach complex language ideas.

It uses Scheme which is a dialect of Lisp. Scheme has little in way of Syntax to learn. Once you get the parentheses, it never gets in your way. Its simplicity aids in expressing complex ideas succinctly. Modern day languages like Ruby and C# are still trying to catch up with ideas from Lisp and Scheme.

Scheme is a functional language, which means functions in Scheme are like pure mathematical functions, which always return a predictable output for a given input, unlike imperative languages which are state-dependant, and functions are first-class objects like primitives.

Programmers love and hate Lisp/Scheme for the same reasons (parentheses), which i put in parentheses to emphasize the fact that i like the language for the same reason. Ruby saved us from the stupid semicolons, Scheme gives you more concise and less ambiguous code, provided editors match and highlight parentheses automatically. The only syntactic rule is every opening parenthesis must have a matching closing parenthesis.

Everything is a list is what my notion of Lisp was before i read this book, but i finally understood that List is a concise way of expressing both program and data in a similar way, and a List is made of atoms or List (which is made of atoms or List(which is made of atoms or List (which is made of atoms or List (...OK,so you get it,a List is made of S-expressions)))).

The same love/hate relationship with programmers goes for Little Schemer as well. GOTO and Recursion are the most dreaded concepts any programming language book will warn you not to use. But if you think of machine/assembly language, which is what your code finally ends up as, its difficult to implement control structures without jumps which are essentially GOTOs. I'm not arguing in favor of GOTOs, but unfortunately recursion gets much the same attention as GOTOs.

Recursion is a beautiful way of solving a problem in terms of itself. Haters of Little Schemer feel that its a book about Recursion repeated endlessly for different problems. That's what you end up with if you just scratch the surface of the book or skim it, but i find myself solving complex Programming problems very easily by employing recursion after reading Little Schemer.

The first 3 chapters of the book teaches you about atoms, S-expressions and Lists. It teaches you some basic List processing functions like car, cdr, cons etc. It makes you implement some List processing functions like lat?, rember, insert, subst etc using car, cdr and cons. It walks you Step by step through the execution of recursive functions and once you get a grasp of it, you can easily follow along and the book lets you explore on your own. Thats where the Socratic method of teaching really shines.

The chapter on Numbers deals with tuples and makes you implement all arithmetic expressions like +, -, =, <, >, *, %, ** using just add1?, sub1? and zero? using recursion. It was a lot like counting fingers in kindergarten, but it helps you solve complex method like length, pick and occur by employing the same methods. Along the way it builds a couple of Laws and Rules for building recursive methods.

The later part of the book deals with abstractions, more recursion, sets, lambda ,unnatural recursion, total functions, partial functions, Y-combinators formed from nested lambdas (i wonder why Paul Graham named his company Y-combinator because i still haven't grasped the concept yet).

I seldom read a programming book without writing code, but this is one book which improves your thinking of recursive programming without actually writing code. I cant wait to go through a second sitting of Little Schemer writing the code and to better undertand the concepts.

Little Schemer has a number of follow-up books in the same style like Seasoned Schemer, Reasoned Schemer etc. And the book ends with suggestions on other interesting books you can read before moving on to those books.

Monday, January 12, 2009

This blog will be a chronicle of my journey towards becoming a language agnostic programmer and my random thoughts on programming and software engineering in general.

new Thread(new Runnable(){
  public void run() {
    random_thoughts.each do |thought|
      (cons blog thought)
    end
  }
}).start()

This is the most polyglot introduction i could think of after 3 years of marveling at the simplicity of Assembly language, endlessly hunting Pointers in C, finally understanding why OO matters in Java, drooling at the syntactic sugar of Ruby and culminating in an enjoyable reading of Little schemer (which is why you see Cons the magnificent in the middle) and journeying towards understanding Lambda the Ultimate.