What difference lists actually are

April 12, 2025 23:25 UTC (Last updated on April 12, 2025 23:25 UTC)

Tags: programming

Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Functional Lists

To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it is doing is a bit misunderstood and, consequently, oversold¹. Let’s see how we would model this approach in an OO language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it would be exactly the same for generic lists.

interface PrependTo {
  String prependTo(String end);
}

class Compose implements PrependTo {
  private PrependTo left;
  private PrependTo right;
  public Compose(PrependTo left, PrependTo right) {
    this.left = left; this.right = right;
  }
  String prependTo(String end) {
    this.left.prependTo(this.right.prependTo(end));
  }
}

class Prepend implements PrependTo {
  private String s;
  public Prepend(String s) { this.s = s; }
  String prependTo(String end) {
    return this.s + end;
  }
}

This is just a straight, manual implementation of closures for (.) and (++) (specialized to strings). Other lambdas not of the above two forms would lead to other implementations of PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an OOP encoding of the algebraic data type:

data PrependTo = Compose PrependTo PrependTo | Prepend String

prependTo :: PrependTo -> String -> String
prependTo (Compose left right) end = prependTo left (prependTo right end)
prependTo (Prepend s) end = s ++ end

We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo some very minor details (that could be resolved by using the Church-encoded version of this), this does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less obviously, applying these functional lists requires traversing this tree of closures – made into an explicit tree here. In fact, this reveals that this representation could require arbitrarily large amounts of work for a given size of output. This is due to the fact that prepending an empty string doesn’t increase the output size but still increases the size of the tree. In practice, it’s a safe assumption that, on average, at least one character will be prepended per leaf of the tree which makes the overhead proportional to the size of the output.

This tree representation is arguably better than the “functional list” representation. It’s less flexible for producers, but that’s arguably a good thing because we didn’t really want arbitrary String -> String functions. It’s more flexible for consumers. For example, getting the head of the list is a relatively efficient operation compared to applying a “functional list” and taking the head of the result even in an eager language. (Laziness makes both approaches comparably efficient.) Getting the last element is just the same for the tree version, but, even with laziness, is much worse for the functional version. More to the point, this concrete representation allows the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern match on whether a function is the identity function or not.

This view makes it very clear what the functional version is doing.

Difference Lists in Prolog

List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.

append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).

For our purposes, viewing this as a function of the first two arguments, this is exactly the usual functional implementation of list concatenation with exactly the same problems. We could, of course, encode the defunctionalized version of the functional approach into (pure) Prolog. This would produce:

prepend_to(compose(Xs, Ys), End, Zs) :- prepend_to(Ys, End, End2), prepend_to(Xs, End2, Zs).
prepend_to(prepend(Xs), End, Zs) :- append(Xs, End, Zs).

(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)

However, this being a logic programming language means we have additional tools available to use that functional languages lack. Namely, unification variables. For an imperative (destructive) implementation of list concatenation, the way we’d support efficient append of linked lists is we’d keep pointers to the start and end of the list. To append two lists, we’d simply use the end pointer of the first to update the end of the first list to point at the start of the second. We’d then return a pair consisting of the start pointer of the first and the end pointer of the second.

This is exactly how Prolog difference lists work, except instead of pointers, we use unification variables which are more principled. Concretely, we represent a list as a pair of lists, but the second list will be represented by an unbound unification variable and the first list contains that same unification variable as a suffix. This pair is often represented using the infix operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.

At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys is a list Xs which contains Ys as a suffix.

The name “difference list” is arguably motivated by the definition of concatenation in this representation.

concat(Xs - Ys, Ys - Zs, Xs - Zs).

This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument is unbound, like it’s supposed to be, then this is a constant-time operation of binding that component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys) where the length is up to either nil or an unbound variable tail².

We also have the unit of concat, i.e. the empty list via³:

empty(Xs - Xs).

See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and Ys.

We get back to a “normal” list via:

to_list(Xs - [], Xs).

% or more generally,
prepend_to(Xs - Ys, Ys, Xs).

to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component of the difference list. This means that the first input no longer meets our condition to be a difference list. In other words, to_list (and prepend_to) consumes the difference list. More precisely, it constrains the possible suffixes the list could be. Indeed, any operation that binds the suffix component of a difference list consumes it. For example, concat consumes its first argument.

Of course, it still makes logical sense to work with the difference list when its suffix component is bound, it’s just that its operational interpretation is different. More to the point, given a difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get two different results.

Converting from a list does require traversing the list since we need to replace the nil node, i.e. [], with a fresh unbound variable. Luckily, this is exactly what append does.

from_list(Xs, Ys - Zs) :- append(Xs, Zs, Ys).

from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more functionally and writing append as addition, we’d have:

\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]

If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal) list with to_list, and then we could use from_list multiple times on that result. This would, of course, still consume the original difference list. We’d also be paying O(length Xs) for every duplicate, including to replace the one we just consumed⁴.

That said, we can prepend to a list to a difference list without consuming it. We can perform other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing into the list would force the list to be at least a certain length, but still allow prepending to any list that will result in a final list at least that long.

Comparison

I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing enforces that a value of type ShowS actually just appends something to its input. We could use abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference lists also need an abstraction barrier to ensure their invariants, though their failure modes are different. A difference list can’t change what it is based on what it is prepended to.

Functional Representation	Difference Lists
constant-time concatenation	constant-time concatenation
constant-time conversion from a list (though you pay for it later)	O(n) conversion from a list
persistent	non-persistent, requires linear use
represented by a tree of closures	represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list	constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language	requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?)	mathematical, if non-obvious, connection to being the difference of two (given) lists

As an illustration of the difference between persistent and non-persistent uses, the function:

double f = f . f

is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the other hand:

double(In, Out) :- concat(In, In, Out).

is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic list) except for when In is the empty difference list.

Conclusion

I hope I’ve illustrated that the functional representation is not just not difference lists, but is, in fact, wildly different from difference lists.

This functional representation is enshrined into Haskell via the ShowS type and related functions, but I’d argue the concrete tree representation is actually clearer and better. The functional representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS should have been an abstract type.

Difference lists are an interesting example of how imperative ideas can be incorporated into a declarative language. That said, difference lists come with some of the downsides of an imperative approach, namely the lack of persistence.

As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional representation. Calling it “functional lists” or something like that is, in my opinion, very ambiguous and potentially misleading. I think the lack of a good name for this is why “difference lists” started becoming popular. As I’ve argued, using “difference list” in this context is even more misleading and confusing.

If people really want a name, one option might be “delta list”. I don’t think this term is used. It keeps the intuitive idea that the functional representation represents some “change” to a list, a collection of deltas that will all be applied at once, but it doesn’t make any false reference to difference lists. I’m not super into this name; I just want something that isn’t “difference list” or otherwise misleading.

To be clear, it’s still much, much, better than using plain concatenation.↩︎
Such a length relation couldn’t be written in pure Prolog but can in actual Prolog.↩︎
For those algebraically minded, this almost makes concat and empty into another monoid except concat is partial, but such a partial monoid is just a category! In other words, we have a category whose objects are lists and whose homsets are, at most, singletons containing Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a suffix of Xs, this thin category is exactly the category corresponding to the reflexive, transitive “has suffix” relation. We could generalize this to any monoid via a “factors through” relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is exactly the slice category of the monoid viewed as a one-object category.↩︎
Again, in actual Prolog, we could make a duplicate without consuming the original, though it would still take O(length Xs) time using the notion of length mentioned before.↩︎