Thursday, July 06, 2006

Infinite Streams in Groovy

In my last post, I showed how you can use byte-code manipulation and Generics to achieve a modicum of functional programming in Java without overly cumbersome syntax. However, it still feels a little clunky.

But how about in Groovy? Groovy's support for real closure syntax turns out to make functional programming, er, Groovy! I won't bother writing up the basics, since they have already been done elsewhere, such as Functional Programming in Groovy

What I will do is show how to build a classic data structure used in functional programming, streams, using Groovy. And then I'll show a classic application of functional streams: a stream of the Sieve of Eratosthenes, in one line of code.

Streams

A stream is a collection, like a list, only that it doesn't necessarily produce its contents until needed, and it may in fact, be infinitely large. It bears some resemblance to streams used in I/O, such as InputStream in Java. For example, one could create a stream of integers, by implementing a class IntegerInputStream whose read() method returns the integers in montonically increasing order.

But our streams won't subclass InputStream, ours will be a sort of linked-list.

LazyLists

First, we'll create a new data structure, LazyList which represents a pair of values: a concrete value 'car', and a closure 'cdr' which returns another LazyList.

class LazyList
{
private def car;
private Closure cdr;
LazyList(def car, Closure cdr) { this.car=car; this.cdr=cdr; }
public def getCar() { return car; }
public LazyList getCdr() { if(cdr != null){ return cdr(); } else { cdr; } }
}

Notice that the getCdr() function doesn't return the cdr field, it returns the result of executing the closure.

Next we'll need a way of constructing a new LazyList, so we add the following method to class LazyList

public static cons(def val, Closure c) { return new LazyList(val, c); }

This factory method just uses the LazyList constructor (for now). And finally, we need a method for grabbing some values out of the list.

public List take(int n)
{
def r = []; def l = this;
n.times { r.add(l.car); l = l.cdr(); }
return r;
}

We're not done yet, but we're already capable of defining our first infinite stream.

def integers(n) { cons(n, { integers(n+1) }) }
def naturalnumbers = integers(1)

naturalnumbers.take(10).each { print "$it\n"; }

The first line says that the integers starting from N, is the result of constructing a list consisting of the element N and the integers starting from N+1. Note that the recursive call to integers(n+1) is inside a closure. Groovy is an eager language, and all function parameters are evaluated before calling the function. Without the closure, this would result in an infinite loop and overflow the stack.

The second line says that the natural numbers are the integers starting from 1.

The third line says to take the first 10 elements from the natural numbers, iterate over the result, printing them out.

Big deal! What about filters?

Yeah, you could do that easily with [1..10].each { print "$it\n" }. So let's move on. What if we want to filter an infinite LazyList? We can do that too, with the following method added to LazyList.

public LazyList filter(Closure pred)
{
if(getCar() != null)
if(pred(getCar()))
return cons(getCar(), { getCdr().filter(pred) })
else
return getCdr().filter(pred)
return l
}

Which says that to filter a LazyList with closure predicate 'pred', we call pred() on the first element of the list, and if true, we construct a new list using this element, and a closure which will invoke filter on the rest of the elements of the list. Otherwise, we recursively call filter until we find an element that satisfies the predicate.

Now we can define the even numbers...

def evennumbers = naturalnumbers.filter { it % 2 == 0 }


The Sieve of Eratosthenes

One of the classic examples used to demonstrate the power of streams is the Sieve of Eratosthenes, which is to define a stream whose successive list of values are the prime numbers.

This can in fact be done in a single line of code using LazyList:

def sieve(list) { cons(list.car, { sieve(list.cdr.filter { x-> x % list.car != 0 }) }) }

although that is kinda ugly, but formatted this way

def sieve(list)
{
cons(list.car,
{
sieve(list.cdr.filter { x-> x % list.car != 0 })
})
}

is easier to read. If you read the Wikipedia link above, you'll see that the sieve takes a list, whose first element is prime, and returns a new list, with this first element, followed by a sieve of another list with all multiplies of the prime number filtered out.

Now we can print the first hundred prime numbers:

def primes = sieve(integers)
primes.take(100).each { print "$it\n" }

or the first 100 fibonacci numbers

def fibo(a,b) { cons(a, { fibo(b, a+b) }) }
fibo(0,1).take(100).each { print "$it\n" }


And with just two other magic functions

public LazyList compose(Closure c, LazyList l)
{
return cons(c(getCar(), l.car), { getCdr().compose(c, l.cdr) })
}

public LazyList map(Closure c)
{
return cons(c(getCar()), { getCdr().map(c); });
}

we can even print out the first 10 twin primes:

def twinprimes = primes.compose({x,y -> [y-x == 2, x]}, primes.cdr).filter({it[0] == true}).map({ it[1] });
twinprimes.take(10).each { print "($it, ${it+2})\n" }

Caveats

First, don't expect this code to run fast. It generates alot of garbage lazylists, uses recursion, and it also recalculates values. The first problem probably can't be fixed, the second problem can be fixed by rewriting the LazyList functions to be tail-recursive and then hacking Groovy to make sure it optimizes tail-recursion.

The final problem (recalcuation) can be fixed with memoization. Recall the cons() function which appears to do nothing but invoke the LazyList constructor? Let's introduce a new function:

public static delay(Closure c)
{
boolean evaled=false;
def val=null;
return {
Object[] args ->
if(!evaled) { evaled=true; val=c(args); return val }
else return val;
}
}

and modify cons

public static cons(def val, Closure c)
{
return new LazyList(val, delay(c));
}


so that now the Closure passed to cons() is only executed once. Subsequent values are returned from the cached value. Of course, this only helps performance if the closure block is computationally expensive to evaluate. Otherwise, the overhead of creating the memoization closure probably incurs even worse penalties.

That's it for today's hack. Practical? Maybe not. Groovy? Yes. If you want to use these techniques and run them more efficiently on the JVM, you may want to look at some of the Scheme/Lisp dialects available for Java.

-Ray

Labels:

2 Comments:

Blogger Slava Pestov said...

Ray, computing twin primes is cute and all, but there is another really neat of lazy lists: parser combinators. I can't find a "canonical" URL which describes them, but maybe you can google for "parser combinators + [language of your choice]". Basically parser combinators allow you to embed a parser generator DSL in any language that supports currying, closures and lazy lists.

2:10 PM  
Blogger Mert Nuhoglu said...

Ray, this is a great article on using groovy for functional programming. If you have more examples on this topic, can you please publish them too?

2:11 AM  

Post a Comment

<< Home