Thursday, July 06, 2006

Infinite Streams in Groovy

In my last post, I showed how you can use byte-code manipulation and Generics to achieve a modicum of functional programming in Java without overly cumbersome syntax. However, it still feels a little clunky.

But how about in Groovy? Groovy's support for real closure syntax turns out to make functional programming, er, Groovy! I won't bother writing up the basics, since they have already been done elsewhere, such as Functional Programming in Groovy

What I will do is show how to build a classic data structure used in functional programming, streams, using Groovy. And then I'll show a classic application of functional streams: a stream of the Sieve of Eratosthenes, in one line of code.

Streams

A stream is a collection, like a list, only that it doesn't necessarily produce its contents until needed, and it may in fact, be infinitely large. It bears some resemblance to streams used in I/O, such as InputStream in Java. For example, one could create a stream of integers, by implementing a class IntegerInputStream whose read() method returns the integers in montonically increasing order.

But our streams won't subclass InputStream, ours will be a sort of linked-list.

LazyLists

First, we'll create a new data structure, LazyList which represents a pair of values: a concrete value 'car', and a closure 'cdr' which returns another LazyList.

class LazyList
{
private def car;
private Closure cdr;
LazyList(def car, Closure cdr) { this.car=car; this.cdr=cdr; }
public def getCar() { return car; }
public LazyList getCdr() { if(cdr != null){ return cdr(); } else { cdr; } }
}

Notice that the getCdr() function doesn't return the cdr field, it returns the result of executing the closure.

Next we'll need a way of constructing a new LazyList, so we add the following method to class LazyList

public static cons(def val, Closure c) { return new LazyList(val, c); }

This factory method just uses the LazyList constructor (for now). And finally, we need a method for grabbing some values out of the list.

public List take(int n)
{
def r = []; def l = this;
n.times { r.add(l.car); l = l.cdr(); }
return r;
}

We're not done yet, but we're already capable of defining our first infinite stream.

def integers(n) { cons(n, { integers(n+1) }) }
def naturalnumbers = integers(1)

naturalnumbers.take(10).each { print "$it\n"; }

The first line says that the integers starting from N, is the result of constructing a list consisting of the element N and the integers starting from N+1. Note that the recursive call to integers(n+1) is inside a closure. Groovy is an eager language, and all function parameters are evaluated before calling the function. Without the closure, this would result in an infinite loop and overflow the stack.

The second line says that the natural numbers are the integers starting from 1.

The third line says to take the first 10 elements from the natural numbers, iterate over the result, printing them out.

Big deal! What about filters?

Yeah, you could do that easily with [1..10].each { print "$it\n" }. So let's move on. What if we want to filter an infinite LazyList? We can do that too, with the following method added to LazyList.

public LazyList filter(Closure pred)
{
if(getCar() != null)
if(pred(getCar()))
return cons(getCar(), { getCdr().filter(pred) })
else
return getCdr().filter(pred)
return l
}

Which says that to filter a LazyList with closure predicate 'pred', we call pred() on the first element of the list, and if true, we construct a new list using this element, and a closure which will invoke filter on the rest of the elements of the list. Otherwise, we recursively call filter until we find an element that satisfies the predicate.

Now we can define the even numbers...

def evennumbers = naturalnumbers.filter { it % 2 == 0 }


The Sieve of Eratosthenes

One of the classic examples used to demonstrate the power of streams is the Sieve of Eratosthenes, which is to define a stream whose successive list of values are the prime numbers.

This can in fact be done in a single line of code using LazyList:

def sieve(list) { cons(list.car, { sieve(list.cdr.filter { x-> x % list.car != 0 }) }) }

although that is kinda ugly, but formatted this way

def sieve(list)
{
cons(list.car,
{
sieve(list.cdr.filter { x-> x % list.car != 0 })
})
}

is easier to read. If you read the Wikipedia link above, you'll see that the sieve takes a list, whose first element is prime, and returns a new list, with this first element, followed by a sieve of another list with all multiplies of the prime number filtered out.

Now we can print the first hundred prime numbers:

def primes = sieve(integers)
primes.take(100).each { print "$it\n" }

or the first 100 fibonacci numbers

def fibo(a,b) { cons(a, { fibo(b, a+b) }) }
fibo(0,1).take(100).each { print "$it\n" }


And with just two other magic functions

public LazyList compose(Closure c, LazyList l)
{
return cons(c(getCar(), l.car), { getCdr().compose(c, l.cdr) })
}

public LazyList map(Closure c)
{
return cons(c(getCar()), { getCdr().map(c); });
}

we can even print out the first 10 twin primes:

def twinprimes = primes.compose({x,y -> [y-x == 2, x]}, primes.cdr).filter({it[0] == true}).map({ it[1] });
twinprimes.take(10).each { print "($it, ${it+2})\n" }

Caveats

First, don't expect this code to run fast. It generates alot of garbage lazylists, uses recursion, and it also recalculates values. The first problem probably can't be fixed, the second problem can be fixed by rewriting the LazyList functions to be tail-recursive and then hacking Groovy to make sure it optimizes tail-recursion.

The final problem (recalcuation) can be fixed with memoization. Recall the cons() function which appears to do nothing but invoke the LazyList constructor? Let's introduce a new function:

public static delay(Closure c)
{
boolean evaled=false;
def val=null;
return {
Object[] args ->
if(!evaled) { evaled=true; val=c(args); return val }
else return val;
}
}

and modify cons

public static cons(def val, Closure c)
{
return new LazyList(val, delay(c));
}


so that now the Closure passed to cons() is only executed once. Subsequent values are returned from the cached value. Of course, this only helps performance if the closure block is computationally expensive to evaluate. Otherwise, the overhead of creating the memoization closure probably incurs even worse penalties.

That's it for today's hack. Practical? Maybe not. Groovy? Yes. If you want to use these techniques and run them more efficiently on the JVM, you may want to look at some of the Scheme/Lisp dialects available for Java.

-Ray

Labels:

Saturday, July 01, 2006

Fun with Generics and CGLib: Functional Programming in Java with less hassle


Come on, admit it. You've got functional envy. You've seen Quicksort in 3 lines or less of code. You've salivated over map/reduce and function curry looks as good as it tastes.

Up until now, you've had two choices. One, switch to a language which allows functional expressions with economical syntax, such as Scheme, Haskell, Ruby, or Groovy. Or, use laborious Anonymous Inner/Local classes to fake it like commons-collections.

We'll, I'm gonna show you how to do it without much syntactic overhead, and preserve enough type information to even allow your IDE to do some refactoring, and popup fields and methods on return values of "functions"

Preliminaries

Before we get started, you should review Alex Winston's Strongly Typed Java Delegates and my previous blog entry on Refactorable Type-Safe Configuration without XML or Annotations. If you don't know what functional programming is, or why it matters, read Why Functional Programming Matters.


Goal

Our goal is to try and make writing functions as easy as possible, and allow them to be curried, composed, and passed as arguments to other functions. Functions, as used in this article, are not just any old Java method. They are Java instance methods which are pure-functional, that is, without side-effects, and do not reference any object fields or non-static variables. Essentially, they are non-mutating static functions declared without a static qualifier.

This,

public MathFuncs
{
public int add(int x, int y) { return x+y; }
public int mul(int x, int y) { return x*y; }
}

is an example of functions. They are an instance methods and do not mutate state.

Example functional usage


Lambda mul = lambda(nop(MathFuncs.class).mul(0,0));
Lambda triple = mul.curry(3);
Lambda quadruple = mul.curry(4);
Lambda timesTwelve = compose(triple, quadruple);

System.out.println("10 timesTwelve = "+timesTwelve.call(10));


or how about


Lambda sum = reducel(add, 0);
// the numbers 1-10 times twelve, and then summed together
int x=sum.call(map(timesTwelve, list(1,2,3,4,5,6,7,8,9,10)));



Ok, my interest is peaked. What is that funky lambda and nop function?

All Java functions must first be converted to Lambda functions, which is a Java interface with the following signature:

public interface Lambda<T> {
public T call(Object... args);
public Lambda<T> curry(Object arg);
public Method method();
}

Extremely simple as you can see, you can either invoke a function, curry it, or get access to the underlying Java reflection Method, which is used internally. Ordinarily, to make Java functions into Lambda functions, you've have to write classes like this

public class Add implements Lambda<Integer>
{
public T call(Object... arg) { return MathFuncs.instance().add(arg[0], arg[1]); }
...
}

and indeed, you'd have to write a separate implementation for each and every interface you use.

Ahhh...Don't tell me. Delegates!
Yes, if you recall the Strongly Typed Delegates trick, this will be yet another variation. What we'll do is use the nop() function to create an CGLib interceptor for a given class, return an instance of this proxied class, and invoke a method on this instance. This will then tell the proxy which method we want to create a Lambda for.

Here's a snippet of how nop() works, the complete listing will follow later

public static <T> T nop(Class<T> clz)
{
T x = (T) Enhancer.create(clz, new Class[]{Lambda.class},
new LambdaInterceptor(clz));
return x;
}

Now, you'll note that this function does not return a Lambda object! It returns an instance of the type of the class it was invoked on! nop(Foo.class) returns Foo! That's where lambda() comes in. It turns instances of Objects into Lambda interfaces! When you invoke a method like add() on Foo.class, the interceptor takes the return value of the add() function and associates itself with it. Then lambda converts these back into Lambda interfaces.

public static <T> Lambda<T> lambda(T pm) {
Lambda<T> l = (Lambda<T>) interceptors.get(pm);
interceptors.remove(pm);
return l;
}

How does it do it? Well, when you write nop(Foo.class), a CGLib proxy is created for an instance of class Foo, and placed into a global interceptors Map with object instance as key, and the Lambda interceptor as value. Next, when you call something like nop(Foo.class).add(1,2), the add() function gets intercepted, and a java.lang.Method reference is placed in this map as well. Finally, when you call lambda() on an instance that has be proxied by a LambdaInterceptor, it pulls the interceptor out of the map, and casts it into a Lambda.

Other Tricks
If you feel like typing extra generics parameters, you could have call() methods which are type-safe. For example, you could create a Lambda1 and Lambda2 interface which both extend Lambda, but which have non-varargs call() methods taking 1 and 2 parameters each.

Then you could write code like

lambda2(nop(MathFuncs.class).add(1,2)).call("Foo", "Bar")


And your IDE would flag call("Foo", "Bar") as an error! I leave the generic type signatures of Lambda1 and Lambda2, and the definition of the lambda1 and lambda2 helper functions as an exercise for the reader.

As a final note, yet, there is a way to optimize away some of the unneccessary object creation, which will I do in a later release.

Have fun(ctional)!
-Ray



Here's the complete listing for Func.java, which includes the LambdaInterceptor
Func.java

package functional;

import net.sf.cglib.proxy.Enhancer;
import net.sf.cglib.proxy.MethodInterceptor;
import net.sf.cglib.proxy.MethodProxy;

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.HashMap;

/**
* Created by IntelliJ IDEA.
* User: ray
* Date: Jun 29, 2006
* Time: 12:21:31 PM
*
*/
public class Func {

protected static HashMap<Object, LambdaInterceptor> interceptors = new HashMap<Object, LambdaInterceptor>();

public static <T> Lambda<T> lambda(T pm) {
Lambda<T> l = (Lambda<T>) interceptors.get(pm);
interceptors.remove(pm);
return l;
}

public static <T> T nop(Class<T> clz) {
T x = (T) Enhancer.create(clz, new Class[]{Lambda.class}, new LambdaInterceptor(clz));
return x;
}


public static <F,G> Lambda<F> compose(Lambda<F> f, Lambda<G> g) {
return new CompositionLambda<F>(f, g);
}

protected static class LambdaInterceptor<T,S> implements MethodInterceptor, Lambda<S> {
private T proxy;
private Class<T> clazz;
private Method binding = null;

public LambdaInterceptor(Class<T> target) {
try {
this.proxy = target.newInstance();
} catch (InstantiationException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
} catch (IllegalAccessException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
this.clazz = target;
}

public Object intercept(final Object o, final Method m,
final Object[] args, MethodProxy mp) throws Throwable {


if (m.getName().equals("toString")) {
return "Invoked by Debugger variable display!";
} else if (m.getName().equals("method")) {
return method();
} else if (m.getName().equals("call")) {
return call(args);
} else if (m.getName().equals("curry")) {
return curry(args[0]);
} else {
Method[] meths = proxy.getClass().getMethods();
for(Method mi : meths)
if(mi.equals(m)) { binding=mi; break; }

if(binding==null) throw new NoSuchMethodException(m.toString());
binding.setAccessible(true);
Object ob = null;

ob = binding.invoke(proxy, args);
interceptors.put(ob, this);
return ob;

}
}


private static Class[] classes(Object[] objects) {
Class[] classes = new Class[objects.length];
for (int i = 0; i < objects.length; i++)
classes[i] = objects[i].getClass();
return classes;
}

public S call(Object... args) {
try {
return (S) binding.invoke(proxy, args);
} catch (IllegalAccessException e) {
e.printStackTrace();
throw new RuntimeException(e);
} catch (InvocationTargetException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}

public Lambda<S> curry(Object arg) {
return new CurriedLambda(this, arg);
}

public Method method() {
return binding;
}
}

}


Util.java

package functional;

import static functional.Func.lambda;
import static functional.Func.nop;

import static java.util.Collections.EMPTY_LIST;
import java.util.*;

/**
* Created by IntelliJ IDEA.
* User: ray
* Date: Jun 30, 2006
* Time: 5:33:25 PM
*/
public class Util {
public static <F, I> F reduce(Lambda<F> f, I init, List<I> list) {
if (list == null || list.isEmpty()) return f.call(init, init);
if (list.size() == 1) return f.call(init, list.get(0));
return f.call(list.get(0), reduce(f, init, list.subList(1, list.size())));
}

public <F,I> F _reduce(Lambda<F> f, I init, List<I> list)
{
return reduce(f, init, list);
}

public static <F, I> Lambda<F> reducel(Lambda<F> f, I init) {
return lambda(nop(Util.class)._reduce(f, init, (List<I>) EMPTY_LIST)).curry(f).curry(init);
}


public <T> List<T> _map(Lambda<T> lambda, List<?> list)
{
return map(lambda, list);
}

public static <T> Lambda<List<T>> mapl(Lambda<T> lambda)
{
return lambda(nop(Util.class)._map(lambda, (List<?>) EMPTY_LIST));
}

public static <T> List<T> map(Lambda<T> lambda, List<?> list)
{

List<T> nl = newList(list);
for(Object o : list)
{
nl.add(lambda.call(o));
}
return nl;
}

private static <T> List<T> newList(List<?> list) {
return list instanceof LinkedList ? new LinkedList<T>() : new ArrayList<T>();
}

public static List<Integer> list(int... ints)
{
ArrayList al=new ArrayList();
for(int x : ints) al.add(x);
return al;
}


public static <T> List<T> cons(T a, List<T> l)
{
List<T> nl = newList(l);
nl.add(a);
nl.addAll(l);
return nl;
}

public <T> List<T> _cons(T a, List<T> l)
{
return cons(a, l);
}

public static <T> Lambda<List<T>> consl(T a)
{
return lambda( nop(Util.class)._cons(a, (List<T>)EMPTY_LIST));
}
}


CurriedLambda.java

package functional;

import java.lang.reflect.Method;

/**
* Created by IntelliJ IDEA.
* User: ray
* Date: Jun 30, 2006
* Time: 5:31:43 PM
*/
public class CurriedLambda<T> implements Lambda<T> {
private final Lambda<T> lambda;
private final Object arg;

public CurriedLambda(Lambda<T> lambda, Object arg) {

this.lambda = lambda;
this.arg = arg;
}

public T call(Object... args) {
return lambda.call((Object[]) args(arg, args));
}

public Lambda<T> curry(Object arg) {
return new CurriedLambda(this, arg);
}

public Method method() {
return lambda.method();
}

private Object args(Object arg, Object[] args) {
Object[] nargs = new Object[args.length + 1];
nargs[0] = arg;
System.arraycopy(args, 0, nargs, 1, args.length);
return nargs;
}
}


CompositionLambda.java

package functional;

import java.lang.reflect.Method;

/**
* Created by IntelliJ IDEA.
* User: ray
* Date: Jun 30, 2006
* Time: 5:31:35 PM
*/
public class CompositionLambda<T> implements Lambda<T> {
private final Lambda<T> f;
private final Lambda g;

public CompositionLambda(Lambda<T> f, Lambda g) {

this.f = f;
this.g = g;
}

public T call(Object... args) {
return f.call(g.call((Object[]) args));
}

public Lambda<T> curry(Object arg) {
return new CurriedLambda<T>(this, arg);
}

public Method method() {
return f.method();
}
}

FuncTest.java test code

package functional;


import static functional.Func.compose;
import static functional.Func.nop;
import static functional.Func.lambda;
import static functional.Util.list;

import java.util.Collection;

/**
* Created by IntelliJ IDEA.
* User: ray
* Date: Jun 30, 2006
* Time: 1:16:23 PM
*/
class FuncTest {
public int add(int x, int y) {
return x + y;
}

public int mul(int x, int y) {
return x * y;
}

public int square(int x) { return x*x; }
public Boolean greater(Comparable x, Comparable y)
{
return x.compareTo(y) > 0;
}

public Boolean or(Boolean a, Boolean b) { return a.booleanValue() || b.booleanValue(); }
public Boolean and(Boolean a, Boolean b) { return a.booleanValue() && b.booleanValue(); }

public Boolean truth(Object o)
{
if(o == null) return Boolean.FALSE;
if(o instanceof Number) return ((Number)o).longValue() != 0;
if(o instanceof String) return ((String)o).length() != 0;
if(o instanceof Collection) return ((Collection)o).size() != 0;

return false;
}

public static void main(String[] args) {
try {
// Func.curry(Test.class, Func.nop(Test.class).add(1,2)).curry(1).call(2);
Lambda<Integer> mul = lambda(nop(FuncTest.class).mul(1, 2));

Lambda<Integer> plus = lambda(nop(FuncTest.class).add(1, 2));
Lambda<Integer> triple = mul.curry(3);
Lambda<Integer> quadruple = mul.curry(4);
Lambda<Integer> timesTwelve = compose(triple, quadruple);
Lambda<Integer> sum = Util.reducel(plus, 0);
Lambda<Integer> square = lambda(nop(FuncTest.class).square(0));

quadruple.call(12);

System.out.println(
timesTwelve.call(
sum.call(list(1,2,3,4,5,6,7,8,9,10))));

System.out.println(
sum.call(Util.map(square, list(1,2,3,4,5,6,7,8,9,10))));


Lambda<Boolean> or = lambda(nop(FuncTest.class).or(true, true));
Lambda<Boolean> and = lambda(nop(FuncTest.class).and(true, true));
Lambda<Boolean> truth = lambda(nop(FuncTest.class).truth(true));

Lambda<Boolean> anyTrue = Util.reducel(or, false);
Lambda<Boolean> allTrue = Util.reducel(and, true);

Lambda<Boolean> greater = lambda(nop(FuncTest.class).greater(1,2));

System.out.println("There is a number greater than less than 0 "+anyTrue.call(Util.map(greater.curry(2), list(1,2,3,4,5,6,7,8,9,10))));

} catch (Exception e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
}

Labels: