Purely-Functional Incremental Mutation

April 5th, 2009

In writing functional programs, we’ve learned to put aside many problematic techniques for writing programs, especially iteration and mutation over data structures.  On the other hand, some folks aren’t buying in to this functional nonsense, and they continue to build elaborate (and working, they will remind you) software based fundamentally on algorithms incrementally mutating data structures.

Can’t we all just get along?

(An answer inside, plus the continuation of our exploration of type-classes.)

Yes We Can

We just need to be liberal with how we define “mutation.”  In the practice of most programming languages, “mutation” means destructive updates — so that when we perform an update on a data structure, the previous state of that structure is lost.  However, when most people think of mutation (and when they consider why it’s useful to them) they typically think of an update that takes a (short) constant amount of time — and this becomes most useful in the aggregate of a lot of little mutations applied incrementally.  When you tell them that a functional update preserves previous states by creating an updated copy, a little cash-register in their brain goes nuts calculating the expense of (say) making three consecutive updates.

But we can perform a purely functional update in constant time.  As a trivial example, consider a pair of values.  We can update a pair either by updating its first element or its second; in Haskell:

update_left  (_, y) x = (x, y)
update_right (x, _) y = (x, y)

Although these functions don’t destructively update the structures they work on, they do produce updated structures in a constant amount of time.  So where does the expense of this functional method get out of hand?

Consider a function to update the nth element of a list.  This kind of function is easy enough to write:

update_nth []     _ _ = []
update_nth (_:xs) 0 v = v:xs
update_nth (x:xs) n v = x:(update_nth xs (n – 1) v)

This does work correctly, but it’s a good example of the kind of argument that a mutation-loving programmer could justifiably make against Haskell.  Because, if you need to update just the 70th, 90th, and 120th elements of the list, you wind up stepping through the first 69, 89, and 119 elements at each stage (for a total of 69+89+119=227 visits).  However, in an incremental C/C++/Java program we can just increment an iterator and update it, meaning that for the equivalent program we’d only need to visit 119 elements (of course, the distance between these two expenses gets even greater with longer lists, and with more updates).

Where did we go wrong?

If purely functional updates to tuples are efficient, perhaps we can make our list look like a tuple.  This basic insight led to the introduction of the Zipper, an important kind of data-type.

To see how this solves our problems for lists, we need to consider the concept of a one-hole context.  A one-hole context represents a “point of interest” in a data structure.  So for instance, with a list, a one-hole context should represent the list elements we’ve visited up to our point of interest, and what remains after our point of interest.

For example, to focus on the 3rd element of this list:

[1,2,3,4,5]

Our one-hole context should look like this:

([2,1], [4,5])

Notice that the first list is in reverse order, because it represents the elements in the order that we visited them.

Now with this data type we’ve almost settled on the perfect representation.  All that we need to do is add the value of interest, and we’ll have a complete representation of the list with an element in focus and in just the right form for purely functional updates.  Again, with the 3rd element of the list [1,2,3,4,5] selected:

([2,1], 3, [4,5])

Now we can write a general function to update this iterator value:

update (prefix, _, suffix) x = (prefix, x, suffix)

We can also write functions to increment this iterator (again, a constant-time operation):

forward (_,  _, [])   = error “Iterator at end.”
forward (px, x, s:sx) = (x:px, s, sx)

And also to decrement the iterator:

backward ([], _, _)    = error “Iterator at beginning.”
backward (p:px, x, sx) = (px, p, x:sx)

Finally, we have a method to efficiently and incrementally walk a list, updating arbitrary parts of it in constant time in a purely functional way.  Of course, there is an added expense (hopefully one that we can amortize away) if we need to reconstruct the “real” updated list from our iterator:

close (px, x, sx) = (reverse px) ++ (x:sx)

Beyond Lists

At this point, we may be able to win over some mutation-loving programmers.  However, the point will quickly be made that in the real world we use data structures other than lists.  Binary trees, n-ary trees, directed graphs, and many other kinds of structures crop up in all sorts of problems.

Once again, type-classes are here to save us.  We just need to take a step back, and recognize that to apply this principle of purely-functional-incremental-update-able iterators to arbitrary “container types” we need a couple of suitable type-classes.  We’ll need to begin our script with the following line:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances, UndecidableInstances, TypeSynonymInstances #-}

First, our representation of iterators should encapsulate everything that we’ve learned here.  We should be able to increment and decrement them.  We should be able to check their current value, and update them.  For convenience, we should also be able to determine when we’re at the end (or beginning) of the iterator’s extent:

class Iterator b a | b -> a where
   bos      :: b -> Bool
   eos      :: b -> Bool
   forward  :: b -> b
   backward :: b -> b
   value    :: b -> a
   update   :: b -> a -> b

Here, the type “b” represents an iterator and “a” its “element type” (hence the functional dependency “b -> a” saying that b derives a).

Finally, we’ll need to convert back and forth between a container and its iterator:

class Iterator b a => Iterable f b a | f -> b, b -> f where
   open  :: f a -> b
   close :: b -> f a

This type-class captures the critical observation that every container-type has a unique iterator type (“f -> b”) and that every iterator type derives a unique container type (“b -> f”).

Now, as we’ve already spent the first half of this article addressing it, we can put lists and their iterators into this type-class:

instance Iterable [] ([a], a, [a]) a where
   open (x:xs)       = ([], x, xs)
   close (px, x, sx) = (reverse px) ++ (x:sx)

instance Iterator ([a], a, [a]) a where
   bos ([], _, _)           = True
   bos _                    = False
   eos (_, _, [])           = True
   eos _                    = False
   forward  (px,   x, s:sx) = (x:px, s, sx)
   backward (p:px, x, sx)   = (px, p, x:sx)
   value  (_, x, _)         = x
   update (px, _, sx) x     = (px, x, sx)

With this structure in place, it is now possible to write functions that perform incremental updates to arbitrary data structures.  So far we’ve only shown that this can be done for lists.  However, if we follow the lesson of the Zipper we can also put (say) binary trees into the Iterable type-class as well.

Recall our definition of binary trees:

data Tree a = Leaf a | Branch (Tree a) (Tree a)
   deriving (Show, Eq)

Now, to represent a point in a binary tree (its one-hole context) we need to record the converse branches at each step of the path we followed from the root of the tree (similar to the Deutsch-Schorr-Waite algorithm for constant-space garbage collection).  Along with that one-hole context representation, we’ll store another binary tree (the focused tree) to complete our iterator type.

type TreeIter a = (Tree a, [Either (Tree a) (Tree a)])

To aid us in this representation of a binary tree’s iterator, we will need a few utility functions.  First, to move up one level in the tree:

up (r, (Left  l):p) = (Branch l r, p)
up (l, (Right r):p) = (Branch l r, p)

Also, to descend to the first leaf (either by following all left or right branches):

descendl c@(Leaf _, p)   = c
descendl (Branch l r, p) = descendl (l, (Right r):p)
descendr c@(Leaf _, p)   = c
descendr (Branch l r, p) = descendr (r, (Left  l):p)

Notice here that the “up” and “descend” functions cancel each other out (“up” removes a node from the list of converse branches, and “descend[l|r]” adds one at each step).

Also, to make it slightly easier to check the converse branches in our paths, we can define a couple of simple utilities:

is_right (Left _)  = False
is_right (Right _) = True
is_left            = not . is_right

Now we have enough to easily put Tree, our type of binary trees, into the Iterable type-class:

instance Iterable Tree (TreeIter a) a where
   open t                 = descendl (t, [])
   close (t, [])          = t
   close (r, (Left  l):p) = close (Branch l r, p)
   close (l, (Right r):p) = close (Branch l r, p)

instance Iterator (Tree a, [Either (Tree a) (Tree a)]) a where
   bos (_, p)                   = all is_right p
   eos (_, p)                   = all is_left p
   forward    (t, (Right t’):p) = descendl (t’, (Left t):p)
   forward  c@(t, (Left  t’):p) = forward (up c)
   backward   (t, (Left  t’):p) = descendr (t’, (Right t):p)
   backward c@(t, (Right t’):p) = backward (up c)
   value  (Leaf x, _)           = x
   update (_, p) x              = (Leaf x, p)

Phew!  That’s it; we can now incrementally iterate over lists and trees, updating arbitrary elements in a purely functional way, in a constant amount of time.

Iterable implementations for other container types can be derived in a straightforward way.

12 Responses to “Purely-Functional Incremental Mutation”

  1. damien Says:

    Whoa.

    Seems like incremental update is only accessible to the brainiest 1% of the planet when using functional languages.

    The rest simply use a[i].x += 5 or somesuch, with need for fingers or zippers or a PhD.

  2. Kalani Thielen Says:

    Damien,

    I’ll assume you’re not joking (please ignore the rest of my response if you were joking). :)

    You’re comparing apples to oranges. It’s actually quite easy to *use* these iterator types (in fact, that’s the point of putting them in these convenient type-classes). In the same way, it’s easy to *use* array-indexing/mutation in C. However, a fair comparison is between this category of type-classes and the compiler-machinery in C that makes “a[i].x+=5″ possible.

    When *using* these iterators, you could write something like the following to update the 4th value to “foo”:

    update (iter_n (open A) 3) “foo”

    Even that is a bit long, but it’s pretty much the same as what you’d write in C++ to do the equivalent thing to a linked list:

    *(iter_n(A.begin(), 3)) = “foo”

    Plus, the update in the Haskell version is purely-functional — so you can revert to the previous state if necessary or do something with both states, etc.

    You might make an argument that the Haskell version is difficult, because you have to figure out how to iterate over some new data structure that you come up with. However, there’s a straightforward procedure to find that type (it’s just the derivative — and the compiler itself could do this for you in theory; a future GHC probably will). OTOH, if you provide iterators for your own data structures in Java/C++, you have to write this kind of code anyway.

    Finally, I think I’m a living contradiction to your argument that you have to be in the “brainiest 1% of the planet” or have a PhD to understand this.

  3. links for 2009-04-07 « Blarney Fellow Says:

    [...] Purely-Functional Incremental Mutation » Lab49 Blog (tags: haskell fp data-structure) [...]

  4. crooney Says:

    Thanks for the cogent and easy to follow explanation. My eyes kind of glazed over at the recent thread on haskell-cafe, but this made it seem simple.

  5. Matt Says:

    Cool stuff. There’s perhaps a slight difference of emphasis between your Iterator typeclass and the classic Iterator design pattern from OO. The OO Iterator is focused on walking over the actual elements of a data structure, whereas yours is more a sequence of alterable locations of a data structure. For example, it’s not clear what it’d mean to define your Iterator over a Set collection, for example.

    P.S. Is it possible to shorten the second definition of forward to:

    forward c@(t, (Left t’):p) = forward (up c)

    (likewise backward), or have I missed something?

    P.P.S. It’s be awesome if you could post in Literate Haskell so it’s easier to copy and paste the source to play around with this stuff.

  6. Kalani Thielen Says:

    Matt,

    Oops, you’re right about the redundant descend functions. I’m not sure how those got in there, but I’ve taken them out. :)

    I’m not sure that I completely follow your distinction with the “OO iterator”. Are you saying that the OO iterator doesn’t support updating? I can see how updating an arbitrary element in a set would cause problems (although that could be accounted for in the set iterator definition — similarly to how it’s already accounted for when you insert a redundant element into a set). However (that issue aside), if you represent a set as a list or as a binary tree, these iterators should work.

    You’re right; I should have posted this in Literate Haskell. Thank you for the suggestion.

  7. links for 2009-04-07 | Data Entry Says:

    [...] Purely-Functional Incremental Mutation » Lab49 Blog (tags: haskell fp data-structure) [...]

  8. Dimitre Novatchev Says:

    This is a cool way to perform an of any item in a iterable structure.

    However, your post does not touch at all the topic of *insertion* of a new item into the structure.

    I have found out that the Finger Tree data structure solves all these problems.

  9. Dimitre Novatchev Says:

    This is a cool way to perform an *update* of any item in a iterable structure.

  10. Kalani Thielen Says:

    Dimitre,

    Thank you for the comment. My understanding of finger trees is that they’re sort of a general framework for defining many different types of data structures (e.g.: balanced search trees, covering as special cases most of the structures in the “Purely Functional Data Structures” book). So in that sense, they provide efficient functions for searching a container, and they may provide functions for mapping a function over a container, but I don’t think they’re especially useful for incrementally walking a data structure.

    For folks not familiar with finger trees, I like this page (the link to the original paper on finger trees is there as well):

    http://apfelmus.nfshost.com/monoid-fingertree.html

    Since finger trees are just another algebraic data type, you could take their derivative to put them in Iterable. Although, as you and Matt have suggested, arbitrary updates are a little problematic and might be handled better with either finer-grain control (e.g.: MutableIterable, MutableIterator) or with sufficiently intelligent implementations of Iterable (so that arbitrary updates don’t invalidate set/finger-tree invariants).

  11. The Algebra of Data, and the Calculus of Mutation » Lab49 Blog Says:

    [...] have previously discussed ways to recover mutation-like functions, inspired by the most famous data structure of its kind: [...]

  12. Differentiating Types in Haskell » Lab49 Blog Says:

    [...] type of one-hole contexts.  This “leap of logic” gives us a firm foundation for purely-functional incremental mutation, and (more generally) proof that there are interesting things happening at the level of [...]