Planet Haskell

July 10, 2025

Tweag I/O

Publish all your crates everywhere all at once
Cargo is the native package manager and build system for Rust, allowing you to easily bring in dependencies from the global crates.io registry,¹ or to publish your own crates to crates.io. Tor Hovland and I recently contributed a long-requested feature to Cargo, allowing you to package many interdependent packages in one go. That might not sound like a big deal, but there were a few tricky parts; there’s a reason the original feature request was open for more than 10 years! In this post, I’ll walk you through the feature and — if you’re a Rust developer — tell you how you can try it out.

Workspaces

The Rust unit of packaging — like a gem in Ruby or a module in Go — is called a “crate”, and it’s pretty common for a medium-to-large Rust project to be divided into several of them. This division helps keep code modular and interfaces well-defined, and also allows you to build and test components individually. Cargo supports multi-crate workflows using “workspaces”: a workspace is just a bunch of crates that Cargo handles “together”, sharing a common dependency tree, a common build directory, and so on. A basic workspace might look like this:
.
├── Cargo.toml
├── Cargo.lock
├── taco
│   ├── Cargo.toml
│   └── src
│       ├── lib.rs
│       └── ... more source files
└── tortilla
    ├── Cargo.toml
    └── src
        ├── lib.rs
        └── ... more source files
The top-level Cargo.toml just tells Cargo where the crates in the workspace live.²
# ./Cargo.toml
workspace.members = ["taco", "tortilla"]
The crate-level Cargo.toml files tell us about the crates (surprise!). Here’s taco’s Cargo.toml:
# ./taco/Cargo.toml
[package]
name = "taco"
version = "2.0"
dependencies.tortilla = { path = "../tortilla", version = "1.3" }
The dependency specification is actually pretty interesting. First, it tells us that the tortilla package is located at ../tortilla (relative to taco). When you’re developing locally, Cargo uses this local path to find the tortilla crate. But when you publish the taco crate for public consumption, Cargo strips out the path = "../tortilla" setting because it’s only meaningful within your local workspace. Instead, the published taco crate will depend on version 1.3 of the published tortilla crate. This doubly-specified dependency gives you the benefits of a monorepo (for example, you get to work on tortilla and taco simultaneously and be sure that they stay compatible) without leaking that local setup to downstream users of your crates.

If you’ve been hurt by packaging incompatibilities before, the previous paragraph might have raised some red flags: allowing a dependency to come from one of two places could lead to problems if they get out-of-sync. Like, couldn’t you accidentally make a broken package by locally updating both your crates and then only publishing taco? You won’t see the breakage when building locally, but the published taco will be incompatible with the previously published tortilla. To deal with this issue, Cargo verifies packages before you publish them. When you type cargo publish --package taco, it packages up the taco crate (removing the local ../tortilla dependency) and then unpackages the new package in a temporary location and attempts to build it from scratch. This rebuild-from-scratch sees the taco crate exactly as a downstream user would, and so it will catch any incompatibilities between the existing, published tortilla and the about-to-be-published taco.

Cargo’s crate verification is not completely fool-proof because it only checks that the package compiles.³ In practice, I find that checking compilation is already pretty useful, but I also like to run other static checks.

Publish all my crates

Imagine you’ve been working in your workspace, updating your crates in backwards-incompatible ways. Now you want to bump tortilla to version 2.0 and taco to version 3.0 and publish them both. This isn’t too hard:

Edit tortilla/Cargo.toml to increase the version to 2.0.

Run cargo publish --package tortilla, and wait for it to appear on crates.io.

Edit taco/Cargo.toml to increase its version to 3.0, and change its tortilla dependency. to 2.0.

Run cargo publish --package taco.

The ordering is important here. You can’t publish the new taco before tortilla 2.0 is publicly available: if you try, the verification step will fail.

This multi-crate workflow works, but it has two problems:

It can get tedious. With two crates it’s manageable, but what about when the dependency graph gets complicated? I worked for a client whose CI had custom Python scripts for checking versions, bumping versions, publishing things in the right order, and so on. It worked, but it wasn’t pretty.⁴

It’s non-atomic: if in the process of verifying and packaging dependent crates you discover some problems with the dependencies then you’re out of luck because you’ve already published them. crates.io doesn’t allow deleting packages, so you’ll just have to yank⁵ the broken packages, increase the version number some more, and start publishing again. This one can’t be solved by scripts or third-party tooling: verifying the dependent crate requires the dependencies to be published.

Starting in mid-2024, my colleague Tor Hovland and I began working on native support for this in Cargo. A few months and dozens of code-review comments later, our initial implementation landed in Cargo 1.83.0. By the way, the Cargo team are super supportive of new contributors — I highly recommend going to their office hours if you’re interested.

How it works

In our implementation, we use a sort of registry “overlay” to verify dependent crates before their dependencies are published. This overlay wraps an upstream registry (like crates.io), allowing us to add local crates to the overlay without actually publishing them upstream. This kind of registry overlay is an interesting topic on its own. The “virtualization” of package sources is an often-requested feature that hasn’t yet been implemented in general because it’s tricky to design without exposing users to dependency confusion attacks: the more flexible you are about where dependencies come from, the easier it is for an attacker to sneak their way into your dependency tree. Our registry overlay passed scrutiny because it’s only available to Cargo internally, and only gets used for workspace-local packages during workspace publishing.

The registry overlay was pretty simple to implement, since it’s just a composition of two existing Cargo features: local registries and abstract sources. A local registry in Cargo is just a registry (like crates.io) that lives on your local disk instead of in the cloud. Cargo has long supported them because they’re useful for offline builds and integration testing. When packaging a workspace we create a temporary, initially-empty local registry for storing the new local packages as we produce them.

Our second ingredient is Cargo’s Source trait: since Cargo can pull dependencies from many different kinds of places (crates.io, private registries, git repositories, etc.), they already have a nice abstraction that encapsulates how to query availability, download, and cache packages from different places. So our registry overlay is just a new implementation of the Source trait that wraps two other Sources: the upstream registry (like crates.io) that we want to publish to, and the local registry that we put our local packages in. When someone queries our overlay source for a package, we check in the local registry first, and fall back to the upstream registry.

Now that we have our local registry overlay, the workspace-publishing workflow looks like this:

Gather all the to-be-published crates and figure out any inter-dependencies. Sort them in a “dependency-compatible” order, meaning that every crate will be processed after all its dependencies.

In that dependency-compatible order, package and verify each crate. For each crate:

Package it up, removing any mention of local path dependencies.

Unpackage it in a temporary location and check that it builds. This build step uses the local registry overlay, so that it thinks all the local dependencies that were previously added to the local overlay are really published.

“Publish” the crate in the local registry overlay.

In the dependency-compatible order, actually upload all the crates to crates.io. This is done in parallel as much as possible. For example, if tortilla and carnitas don’t depend on one another but taco depends on them both, then tortilla and carnitas can be uploaded simultaneously.

It’s possible for the final upload to fail (if your network goes down, for example) and for some crates to remain unpublished; in that sense, the new workspace publishing workflow is not truly atomic. But because all of the new crates have already been verified with one another, you can just retry publishing the ones that failed to upload.

How to try it

Cargo, as critical infrastructure for Rust development, is pretty conservative about introducing new features. Multi-package publishing was recently promoted to a stable feature, but it is currently only available in nightly builds. If you’re using a recent nightly build of Cargo 1.90.0 or later, running cargo publish in a workspace will work as described in this blog post. If you don’t want to publish everything in your workspace, the usual package-selection arguments should work as expected: cargo publish --package taco --package tortilla will package just taco and tortilla, while correctly managing any dependencies between them. Or you can exclude packages like cargo publish --exclude onions.

If you’re using a stable Rust toolchain, workspace publishing will be available in Cargo 1.90 in September 2025.

If you use Node.js, Cargo is like the npm command and crates.io is like the NPM registry. If you use Python, Cargo is like pip (or Poetry, or uv) and crates.io is like PyPI.↩

It can also contain lots of other useful workspace-scoped information, like dependencies that are common between crates or global compiler settings.↩

To be even more precise, it only checks that the package compiles against the dependencies that are locked in your Cargo.lock file, which gets included in the package. If you or someone in your dependency tree doesn’t correctly follow semantic versioning, downstream users could still experience compilation problems. In practice, we’ve seen this cause binary packages to break because cargo install ignores the lock file by default.↩

There are also several third-party tools (for example, cargo-release, cargo-smart-release, and release-plz) to help automate multi-crate releases. If one of these meets your needs, it might be better than a custom script.↩

“Yanking” is Cargo’s mechanism for marking packages as broken without actually deleting their contents and breaking everyone’s builds.↩
July 10, 2025 12:00 AM

July 09, 2025

Well-Typed.Com

Developing an application from scratch (Haskell Unfolder #46)

Today, 2025-07-09, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 46th episode of the Haskell Unfolder live on YouTube.

Developing an application from scratch (Haskell Unfolder #46)

In this episode targeted at beginners, we show the end-to-end application development process, starting from an empty directory. We’ll consider package configuration, taking advantage of editor integration, how to deal with dependencies, organizing code into modules, and parsing command line arguments. We will use this to write a simple but useful application.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at July 09, 2025 12:00 AM

July 07, 2025

Haskell Interlude

67: Alex McLean

Mike and Andres speak to Alex McLean who created the TidalCycles system for electronic music - implemented in Haskell of course. We talk about how Alex got into Haskell coming from Perl, how types helped him think about the structure of music and patterns, the architecture and evolution of TidalCycles, about art, community and making space for new ideas, and lots of things in between.

by Haskell Podcast at July 07, 2025 02:00 PM

Monday Morning Haskell

Binary Search in Haskell and Rust
This week we’ll be continuing our series of problem solving in Haskell and Rust. But now we’re going to start moving beyond the terrain of “basic” problem solving techniques with strings, lists and arrays, and start moving in the direction of more complicated data structures and algorithms. Today we’ll explore a problem that is still array-based, but uses a tricky algorithm that involves binary search!

You’ll learn more about Data Structures and Algorithms in our Solve.hs course! The last 7 weeks or so of blog articles have focused on the types of problems you’ll see in Module 1 of that course, but now we’re going to start encountering ideas from Modules 2 & 3, which look extensively at essential data structures and algorithms you need to know for problem solving.

The Problem

Today’s problem is median of two sorted arrays. In this problem, we receive two arrays of numbers as input, each of them in sorted order. The arrays are not necessarily of the same size. Our job is to find the median of the cumulative set of numbers.

Now there’s a conceptually easy approach to this. We could simply scan through the two arrays, keeping track of one index for each one. We would increase the index for whichever number is currently smaller, and stop once we have passed by half of the total numbers. This approach is essentially the “merge” part of merge sort, and it would take O(n) time, since we are scanning half of all the numbers.

However, there’s a faster approach! And if you are asked this question in an interview for anything other than a very junior position, your interviewer will expect you to find this faster approach. Because the arrays are sorted, we can leverage binary search to find the median in O(log n) time. The approach isn’t easy to see though! Let’s go over the algorithm before we get into any code.

The Algorithm

This algorithm is a little tricky to follow (this problem is rated as “hard” on LeetCode). So we’re going to treat this a bit like a mathematical proof, and begin by defining useful terms. Then it will be easy to describe the coding concepts behind the algorithm.

Defining our Terms

Our input consists of 2 arrays, arr1 and arr2 with potentially different sizes n and m, respectively. Without loss of generality, let arr1 be the “shorter” array, so that n <= m. We’ll also define t as the total number of elements, n + m.

It is worthwhile to note right off the bat that if t is odd, then a single element from one of the two lists will be the median. If t is even, then we will average two elements together. Even though we won’t actually create the final merged array, we can imagine that it consists of 3 parts:

The “prior” portion - all numbers before the median element(s)

The median element(s), either 1 or 2.

The “latter” portion - all numbers after the median element(s)

The total number of elements in the “prior” portion will end up being (t - 1) / 2, bearing in mind how integer division works. For example, whether t is 15 or 16, we get 7 elements in the “prior” portion. We’ll use p for this number.

Finally, let’s imagine p1, the number of elements from arr1 that will end up in the prior portion. If we know p1, then p2, the number of elements from arr2 in the prior portion is fixed, because p1 + p2 = p. We can then think of p1 as an index into arr1, the index of the first element that is not in the prior portion. The only trick is that this index could be n indicating that all elements of arr1 are in the prior portion.

Getting the Final Answer from our Terms

If we have the “correct” values for p1 and p2, then finding the median is easy. If t is odd, then the lower number between arr1[p1] and arr2[p2] is the median. If t is even, then we average the two smallest numbers among (arr1[p1], arr2[p2], arr1[p1 + 1], arr2[p2 + 1]).

So we’ve reduced this problem to a matter of finding p1, since p2 can be easily derived from it. How do we know we have the “correct” value for p1, and how do we search for it efficiently?

Solving for p1

The answer is that we will conduct a binary search on arr1 in order to find the correct value of p1. For any particular choice of p1, we determine the corresponding value of p2. Then we make two comparisons:

Compare arr1[p1 - 1] to arr2[p2]

Compare arr2[p2 - 1] to arr1[p1]

If both comparisons are less-than-or-equals, then our two p values are correct! The slices arr1[0..p1-1] and arr2[0..p2-1] always constitute a total of p values, and if these values are smaller than arr1[p1] and arr2[p2], then they constitute the entire “prior” set.

If, on the other hand, the first comparison yields “greater than”, then we have too many values for arr1 in our prior set. This means we need to recursively do the binary search on the left side of arr1, since p1 should be smaller.

Then if the second comparison yields “greater than”, we have too few values from arr1 in the “prior” set. We should increase p1 by searching the right half of our array.

This provides a complete algorithm for us to follow!

Rust Implementation

Our algorithm description was quite long, but the advantage of having so many details is that the code starts to write itself! We’ll start with our Rust implementation. Stage 1 is to define all of the terms using our input values. We want to define our sizes and array references generically so that arr1 is the shorter array:
pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
    let mut n = nums1.len();
    let mut m = nums2.len();
    let mut arr1: &Vec<i32> = &nums1;
    let mut arr2: &Vec<i32> = &nums2;
    if (m < n) {
        n = nums2.len();
        m = nums1.len();
        arr1 = &nums2;
        arr2 = &nums1;
    }
    let t = n + m;
    let p: usize = (t - 1) / 2;

    ...
}
Anatomy of a Binary Search

The next stage is the binary search, so we can find p1 and p2. Now a binary search is a particular kind of loop pattern. Like many of the loop patterns we worked with in the previous weeks, we can express it recursively, or with a loop construct like for or while. We’ll start with a while loop solution for Rust, and then show the recursive solution with Haskell.

All loops maintain some kind of state. For a binary search, the primary state is the two endpoints representing our “interval of interest”. This starts out as the entire interval, and shrinks by half each time until we’ve narrowed to a single element (or no elements). We’ll represent these with interval end points with low and hi. Our loop concludes once low is as large as hi.
let mut low = 0;
// Use the shorter array size!
let mut hi = n;
while (low < hi) {
    ...
}
In our particular case, we are also trying to determine the values for p1 and p2. Each time we specify an interval, we’ll see if the midpoint of that interval (between low and hi) is the correct value of p1:
...

let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
    p1 = (low + hi) / 2;
    p2 = p - p1;
    ...
}
Now we evaluate this p1 value using the two conditions we specified in our algorithm. These are self-explanatory, except we do need to cover some edge cases where one of our values is at the edge of the array bounds.

For example, if p1 is 0, the first condition is always “true”. If this condition is negated, this means we want fewer elements from arr1, but this is impossible if p1 is 0.
...

let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
    p1 = (low + hi) / 2;
    p2 = p - p1;
    let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
    let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
    if (cond1 && cond2) {
        break;
    } else if (!cond1) {
        p1 -= 1;
        hi = p1;
    } else {
        p1 += 1;
        low = p1;
    }
}
p2 = p - p1;

...
If both conditions are met, you’ll see we break, because we’ve found the right value for p1! Otherwise, we know p1 is invalid. This means we want to exclude the existing p1 value from further consideration by changing either low or hi to remove it from the interval of interest.

So if cond1 is false, hi becomes p1 - 1, and if cond2 is false, it becomes p1 + 1. In both cases, we also modify p1 itself first so that our loop does not conclude with p1 in an invalid location.

Getting the Final Answer

Now that we have p1 and p2, we have to do a couple final tricks to get the final answer. We want to get the first “smaller” value between arr1[p1] and arr2[p2]. But we have to handle the edge case where p1 might be n AND we want to increment the index for the array we take. Note that p2 cannot be out of bounds right now!
let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
    median = arr1[p1];
    p1 += 1;
} else {
    p2 += 1;
}
If the total number of elements is odd, we can simply return this number (converting to a float). However, in the even case we need one more number to take an average. So we’ll compare the values at the indices again, but now accounting that either (but not both) could be out of bounds.
let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
    median = arr1[p1];
    p1 += 1;
} else {
    p2 += 1;
}

if (t % 2 == 0) {
    if (p1 >= n) {
        median += arr2[p2];
    } else if (p2 >= m) {
        median += arr1[p1];
    } else {
        median += cmp::min(arr1[p1], arr2[p2]);
    }
    let medianF: f64 = median.into();
    return medianF / 2.0;
} else {
    return median.into();
}
Here’s the complete implementation:
pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
    let mut n = nums1.len();
    let mut m = nums2.len();
    let mut arr1: &Vec<i32> = &nums1;
    let mut arr2: &Vec<i32> = &nums2;
    if (m < n) {
        n = nums2.len();
        m = nums1.len();
        arr1 = &nums2;
        arr2 = &nums1;
    }
    let t = n + m;
    let p: usize = (t - 1) / 2;

    let mut low = 0;
    let mut hi = n;
    let mut p1 = 0;
    let mut p2 = 0;
    while (low < hi) {
        p1 = (low + hi) / 2;
        p2 = p - p1;
        let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
        let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
        if (cond1 && cond2) {
            break;
        } else if (!cond1) {
            p1 -= 1;
            hi = p1;
        } else {
            p1 += 1;
            low = p1;
        }
    }
    p2 = p - p1;

    let mut median = arr2[p2];
    if (p1 < n && arr1[p1] < arr2[p2]) {
        median = arr1[p1];
        p1 += 1;
    } else {
        p2 += 1;
    }

    if (t % 2 == 0) {
        if (p1 >= n) {
            median += arr2[p2];
        } else if (p2 >= m) {
            median += arr1[p1];
        } else {
            median += cmp::min(arr1[p1], arr2[p2]);
        }
        let medianF: f64 = median.into();
        return medianF / 2.0;
    } else {
        return median.into();
    }
}
Haskell Implementation

Now let’s examine the Haskell implementation. Unlike the LeetCode version, we’ll just assume our inputs are Double already instead of doing a conversion. Once again, we start by defining the terms:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    n' = V.length input1
    m' = V.length input2
    t = n' + m'
    p = (t - 1) `quot` 2
    (n, m, arr1, arr2) = if V.length input1 <= V.length input2
      then (n', m', input1, input2) else (m', n', input2, input1)

    ...
Now we’ll implement the binary search, this time doing a recursive function. We’ll do this in two parts, starting with a helper function. This helper function will simply tell us if a particular index is correct for p1. The trick though is that we’ll return an Ordering instead of just a Bool:
-- data Ordering = LT | EQ | GT
f :: Int -> Ordering
This lets us signal 3 possibilities. If we return EQ, this means the index is valid. If we return LT, this will mean we want fewer values from arr1. And then GT means we want more values from arr1.

With this framing it’s easy to see the implementation of this helper now. We determine the appropriate p2, figure out our two conditions, and return the value for each condition:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...
    f :: Int -> Ordering
    f pi1 =
      let pi2 = p - pi1
          cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
          cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
      in  if cond1 && cond2 then EQ else if (not cond1) then LT else GT
Now applying we can use this in a recursive binary search. The binary search tracks two pieces of state for our interval ((Int, Int)), and it will return the correct value for p1. The implementation applies the base case (return low if low >= hi), determines the midpoint, calls our helper, and then recurses appropriately based on the helper result.
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...
    f :: Int -> Ordering
    f pi1 = ...
    
    search :: (Int, Int) -> Int
    search (low, hi) = if low >= hi then low else
      let mid = (low + hi) `quot` 2
      in  case f mid of
            EQ -> mid
            LT -> search (low, mid - 1)
            GT -> search (mid + 1, hi)

    p1 = search (0, n)
    p2 = p - p1

    ...
For the final part of the problem, we’ll define a helper. Given p1 and p2, it will emit the “lower” value between the two indices in the array (accounting for edge cases) as well as the two new indices (since one will increment).

This is a matter of lazily defining the “next” value for each array, the “end” condition of each array, and the “result” if that array’s value is chosen:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...

    findNext pi1 pi2 =
      let next1 = arr1 V.! pi1
          next2 = arr2 V.! pi2
          end1 = pi1 >= n
          end2 = pi2 >= m
          res1 = (next1, pi1 + 1, pi2)
          res2 = (next2, pi1, pi2 + 1)
      in  if end1 then res2
            else if end2 then res1
            else if next1 <= next2 then res1 else res2
Now we just apply this either once or twice to get our result!
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
  where
    ...

    tIsEven = even t
    (median1, nextP1, nextP2) = findNext p1 p2
    (median2, _, _) = findNext nextP1 nextP2
    result = if tIsEven
      then (median1 + median2) / 2.0
      else median1
Here’s the complete implementation:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
  where
    n' = V.length input1
    m' = V.length input2
    t = n' + m'
    p = (t - 1) `quot` 2
    (n, m, arr1, arr2) = if V.length input1 <= V.length input2
      then (n', m', input1, input2) else (m', n', input2, input1)

    -- Evaluate the index in arr1
    -- If this does in indicate the index can be part of a median, return EQ
    -- If it indicates we need to move left in shortArr, return LT
    -- If it indicates we need to move right in shortArr, return GT
    -- Precondition: p1 <= n
    f :: Int -> Ordering
    f pi1 =
      let pi2 = p - pi1
          cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
          cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
      in  if cond1 && cond2 then EQ else if (not cond1) then LT else GT
    
    search :: (Int, Int) -> Int
    search (low, hi) = if low >= hi then low else
      let mid = (low + hi) `quot` 2
      in  case f mid of
            EQ -> mid
            LT -> search (low, mid - 1)
            GT -> search (mid + 1, hi)
    
    findNext pi1 pi2 =
      let next1 = arr1 V.! pi1
          next2 = arr2 V.! pi2
          end1 = pi1 >= n
          end2 = pi2 >= m
          res1 = (next1, pi1 + 1, pi2)
          res2 = (next2, pi1, pi2 + 1)
      in  if end1 then res2
            else if end2 then res1
            else if next1 <= next2 then res1 else res2

    p1 = search (0, n)
    p2 = p - p1

    tIsEven = even t
    (median1, nextP1, nextP2) = findNext p1 p2
    (median2, _, _) = findNext nextP1 nextP2
    result = if tIsEven
      then (median1 + median2) / 2.0
      else median1
Conclusion

If you want to learn more about these kinds of problem solving techniques, you should take our course Solve.hs! In the coming weeks, we’ll see more problems related to data structures and algorithms, which are covered extensively in Modules 2 and 3 of that course!
by James Bowen at July 07, 2025 08:30 AM

July 03, 2025

Tweag I/O

Bazel workshop made public

As part of our consulting business we are often invited to solve problems that our clients cannot tackle on their own. It is not uncommon for us to collaborate with a client for extended periods of time; during which, many opportunities for knowledge transfer present themselves, be it in the form of documentation, discussions, or indeed, when the client finds it desirable, in the form of specialized workshops.

In this post weâ€™d like to talk about a workshop that we developed and delivered (so far) five times to different groups of people at the same client. We received positive feedback for it and we believe it was helpful for those who attended it.

The workshop intends to give a principled introduction to the Bazel build system for people who have little or no knowledge of Bazel, but who are software developers and have used a build system before. It is definitely a workshop for a technical audience, and as such it was presented to (among others) dedicated DevOps and DevX teams of the client.

We are happy to announce that the materials of this workshop are now publicly available in the form of:

the git repository of the example project that we use in the exercises https://github.com/tweag/bazel-workshop-2024 and

the accompanying slides.

The original intended duration of the workshop was three days. However, one of these days was dedicated almost entirely to a case study that we cannot share publicly; therefore, the public version is shorter and should amount to approximately two days.

Here are a couple of the introductory slides to give you an impression of the scope, structure, and expected knowledge in this workshop:

It must be pointed out that the workshop was developed in 2024, when the WORKSPACE-based approach to dependency management was still the default choice and so, given that we were time-constrained both at the authoring and presentation stages, we chose not to cover Bzlmod. We are still convinced that familiarity with WORKSPACE and simple repository rules is a prerequisite for understanding Bzlmod. Some newer features like symbolic macros are also not covered. Learning materials for Bazel go out of date quickly, but even so, we believe that the workshop, now public, is still relevant and can be of use for people who are about to embark on their Bazel journey.

July 03, 2025 12:00 AM

June 30, 2025

Monday Morning Haskell

Buffer & Save with a Challenging Example
Welcome back to our series comparing LeetCode problems in Haskell and Rust. Today we’ll learn a new paradigm that I call “Buffer and Save”. This will also be the hardest problem we’ve done so far! The core loop structure isn’t that hard, but there are a couple layers of tricks to massage our data to get the final answer.

This will be the last problem we do that focuses strictly on string and list manipulation. The next set of problems we do will all rely on more advanced data structures or algorithmic ideas.

For more complete practice on problem solving in Haskell, check out Solve.hs, our newest course. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. You’ll get loads of practice building structures and algorithms from scratch, which is very important for understanding and remembering how they work.

The Problem

Today’s problem is Text Justification. The idea here is that we are taking a list of words and a “maximum width” and printing out the words grouped into equal-width lines that are evenly spaced. Here’s an example input and output:
Example Input (list of 9 strings):
[“Study”, “Haskell”, “with”, “us”, “every”, “Monday”, “Morning”, “for”, “fun”]
Max Width: 16

Output (list of 4 strings):
“Study    Haskell”
“with   us  every”
“Monday   Morning”
“for fun         ”
There are a few notable rules, constraints, and edge cases. Here’s a list to sumarize them:

There is at least one word

No word is larger than the max width

All output strings must have max width as their length (including spaces)

The first word of every line is set to the left

The last line always has 1 space between words, and then enough spaces after the last word to read the max width.

All other lines with multiple words will align the final word all the way to the right

The spaces in non-final lines are distributed as evenly as possible, but extra spaces go between words to the left.

The final point is potentially the trickiest to understand. Consider the second line above, with us every. The max width is 16, and we have 3 words with a total of 11 characters. This leaves us 5 spaces. Having 3 words means 2 blanks, so the “left” blank gets 3 spaces and the “right” blank gets 2 spaces.

If you had a line with 5 words, a max width of 30, and 16 characters, you would place 4 spaces in the left two blanks, and 3 spaces in the right two blanks. The relative length of the words does not matter.
Words in Line: [“A”, “good”, “day”, “to”, “endure”]

Output Line:
“A    good    day   to   endure”
The Algorithm

As mentioned above, our main algorithmic idea could be called “buffer and save”. We’ve been defining all of our loops based on the state we must maintain between iterations of the loop. The buffer and save approach highlights two pieces of state for us:

The strings we’ve accumulated for our answer so far (the “result”)

A buffer of the strings in the “current” line we’re building.

So we’ll loop through the input words one at a time. We’ll consider if the next word can be added to the “current” line. If it would cause our current line to exceed the maximum width, we’ll “save” our current line and write it out to the “result” list, adding the required spaces.

To help our calculations, we’ll also include two other pieces of state in our loop:

The number of characters in our “current” line

The number of words in our “current” line

Finally, there’s the question of how to construct each output line. Combining the math with list-mechanics is a little tricky. But the central idea consists of 4 simple steps:

Find the number of spaces (subtract number of characters from max width)

Divide the number of spaces by the number of “blanks” (number of words - 1)

The quotient is the “base” number of spaces per blank

The remainder is the number of blanks (starting from the left) that get an extra space

The exact implementation of this idea differs between Haskell and Rust. Again this rests a lot on the “reverse” differences between Rust vectors and Haskell lists.

The final line has a slightly different (but easier) process. And we should note that the final line will still be in our buffer when we exit the loop! So we shouldn’t forget to add it to the result.

Haskell Solution

We know enough now to jump into our Haskell solution. Our solution should be organized around a loop. Since we go through the input word-by-word, this should follow a fold pattern. So here’s our outline:
justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = ...
  where
    -- f = ‘final’
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord = ...
Let’s focus in on the choice we have to make in the loop. We need to determine if this new word fits in our current line. So we’ll get its length and add it to the number of characters in the line AND consider the number of words in the line. We count the words too since each word we already have requires at least one space!
-- (maxWidth is still in scope here)
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else ...
How do we fill in these choices? If we don’t overflow the line, we just append the new word, bump the count of the words, and add the new word’s length to the character count.
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)
The overflow case isn’t hard, but it does require us to have a function that can convert our current line into the final string. This function will also take the number of words and characters in this line. Assuming this function exists, we just make this new line, append it to result, and then reset our other stateful values so that they only reflect the “new word” as part of our current line.
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
      resultLine = makeLine currentLine wordsInLine charsInLine
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ([newWord], 1, newWordLen, resultLine : currResult)
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

makeLine :: String -> Int -> Int -> String
makeLine = ...
Before we think about the makeLine implementation though, we just about have enough to fill in the rest of the “top” of our function definition. We’d just need another function for making the “final” line, since this is different from other lines. Then when we get our “final” state values, we’ll plug them into this function to get our final line, append this to the result, and reverse it all.
justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine = ...

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal = ...
Now let’s discuss forming these lines, starting with the general case. We can start with a couple edge cases. This should never be called with an empty list. And with a singleton, we just left-align the word and add the right number of spaces:
makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...
Now we’ll calculate the quotient and remainder to get the spacing sizes, as mentioned in our algorithm section. But how do we combine them? There are multiple ways, but the idea I thought of was to zip the tail of the list with the number of spaces it needs to append. Then we can fold it into a resulting list using a function like this:
-- (String, Int) is the next string and the number of spaces after it
combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix
Remember while doing this that we’ve accumulated the words for each line in reverse order. So we want to append each one in succession, together with the number of spaces that come after it.

To use this function, we can “fold” over the “tail” of our current line, while using the first word in our list as the base of the fold! Don’t forget the quotRem math going on in here!
makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...
  let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
      baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
      extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
      wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
  in  foldl combine first wordsWithSpaces

combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix
To make the final line, we can also leverage our combine function! It’s just a matter of combining each word in our input with the appropriate number of spaces. In this case, almost every word gets 1 space except for the last one (which comes first in our list). This just gets however many trailing spaces we need!
makeLineFinal :: [String] -> Int -> Int -> String
makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
makeLineFinal strs wordsInLine charsInLine =
  let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
  in  foldl combine "" (zip strs (trailingSpaces : repeat 1))
Putting all these pieces together, we have our complete solution!
justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine [] _ _ = error "Cannot makeLine with empty string!"
    makeLine [onlyWord] _ charsInLine =
      let extraSpaces = replicate (maxWidth - charsInLine) ' '
      in  onlyWord <> extraSpaces
    makeLine (first : rest) wordsInLine charsInLine =
      let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
          baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
          extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
          wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
      in  foldl combine first wordsWithSpaces

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
    makeLineFinal strs wordsInLine charsInLine =
      let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
      in  foldl combine "" (zip strs (trailingSpaces : repeat 1))

    combine :: String -> (String, Int) -> String
    combine suffix (nextWord, numSpaces) = nextWord <> replicate numSpaces ' ' <> suffix
Rust Solution

Now let’s put together our Rust solution. Since we have a reasonable outline from writing this in Haskell, let’s start with the simpler elements, makeLine and makeLineFinal. We’ll use library functions as much as possible for the string manipulation. For example, we can start makeLineFinal by using join on our input vector of strings.
pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    ...
}
Now we just need to calculate the number of trailing spaces, subtracting the number of characters in the joined string. We append this to the end by taking a blank space and using repeat for the correct number of times.
pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    let trailingSpaces = max_width - result.len();
    result.push_str(&" ".repeat(trailingSpaces));
    return result;
}
For those unfamiliar with Rust, the type of our input vector might seem odd. When we have &Vec<&str>, this means a reference to a vector of string slices. String slices are portions of a String that we hold a reference to, but they aren’t copied. However, when we join them, we make a new String result.

Also note that we aren’t passing wordsInLine as a separate parameter. We can get this value using .len() in constant time in Rust. In Haskell, length is O(n) so we don’t want to always do that.

Now for the general make_line function, we have the same type signature, but we start with our base case, where we only have one string in our current line. Again, we use repeat with the number of spaces.
pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    ...
}
Now we do the “math” portion of this. Rust doesn’t have a single quotRem function in its base library, so we calculate these values separately.
pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        ...
    }
    return result;
}
The while loop we’ll write here is instructive. We use an index instead of a for each pattern because the index tells us how many spaces to use. If our index is smaller than numWithExtraSpace, we add 1 to the base number of spaces. Otherwise we use the base until the index n - 1. This index has no extra spaces, so we’re done at that point!
pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        result.push_str(currentLine[i]);
        if i < numWithExtraSpace {
            result.push_str(&" ".repeat(baseNumSpaces + 1));
        } else if i < n - 1 {
            result.push_str(&" ".repeat(baseNumSpaces));
        }
        i += 1;
    }
    return result;
}
Now we frame our solution. Let’s start by setting up our state variables (again, omitting numWordsInLine). We’ll also redefine max_width as a usize value for ease of comparison later.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    ...
}
Now we’d like to frame our solution as a “for each” loop. However, this doesn’t work, for Rust-related reasons we’ll describe after the solution! Instead, we’ll use an index loop.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        ...
    }
}
We’ll get the word by index on each iteration, and use its length to see if we’ll exceed the max width. If not, we can safely push it onto currentLine and increase the character count:
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            ...
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
}
Now when we do exceed the max width, we have to push our current line onto result (calling make_line). We clear the current line, push our new word, and use its length for charsInLine.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    ...
}
After our loop, we’ll just call make_line_final on whatever is left in our currentLine! Here’s our complete full_justify function that calls make_line and make_line_final as we wrote above:
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    result.push(make_line_final(&currentLine, mw, charsInLine));
    return result;
}
Why an Index Loop?

Inside our Rust loop, we have an odd pattern in getting the “word” for this iteration. We first assign word = &words[i], and then later on, when we push that word, we reference words[i] again, using currentLine.push(&words[i]).

Why do this? Why not currentLen.push(word)? And then, why can’t we just do for word in words as our loop?

If we write our loop as for word in words, then we cannot reference the value word after the loop. It is “scoped” to the loop. However, currentLine “outlives” the loop! We have to reference currentLine at the end when we make our final line.

To get around this, we would basically have to copy the word instead of using a string reference &str, but this is unnecessarily expensive.

These are the sorts of odd “lifetime” quirks you have to learn to deal with in Rust. Haskell is easier in that it spares us from thinking about this. But Rust gains a significant performance boost with these sorts of ideas.

Conclusion

This was definitely the most involved problem we’ve dealt with so far. We learned a new paradigm (buffer and save), and got some experience dealing with some of the odd quirks and edge cases of string manipulation, especially in Rust. It was a fairly tricky problem, as far as list manipulation goes. For an easier example of a buffer and save problem, try solving Merge Intervals.

If you want to level up your Haskell problem solving skills, you need to take our course Solve.hs. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. After this course, you’ll be in great shape to deal with these sorts of LeetCode style problems as they come up in your projects.
by James Bowen at June 30, 2025 08:30 AM

June 28, 2025

Magnus Therning

Reading Redis responses
When I began experimenting with writing a new Redis client package I decided to use lazy bytestrings, because:

aeson seems to prefer it – the main encoding and decoding functions use lazy byte strings, though there are strict variants too.

the Builder type in bytestring produce lazy bytestrings.

At the time I was happy to see that attoparsec seemed to support strict and lazy bytestrings equally well.

To get on with things I also wrote the simplest function I could come up with for sending and receiving data over the network – I used send and recv from Network.Socket.ByteString.Lazy in network. The function was really simple
import Network.Socket.ByteString.Lazy qualified as SB

sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \sock -> do
    _ <- SB.send sock $ toWireCmd cmd
    resp <- SB.recv sock 4096
    case decode resp of
        Left err -> pure $ Left $ RespError "decode" (TL.pack err)
        Right r -> pure $ k <$> fromWireResp cmd r
with decode defined like this
decode :: ByteString -> Either String Resp
decode = parseOnly resp
I knew I'd have to revisit this function, it was naïve to believe that a call to recv would always result in as single complete response. It was however good enough to get going. When I got to improving sendCmd I was a little surprised to find that I'd also have to switch to using strict bytestrings in the parser.

Interlude on the Redis serialisation protocol (RESP3)

The Redis protocol has some defining attributes

It's somewhat of a binary protocol. If you stick to keys and values that fall within the set of ASCII strings, then the protocol is humanly readable and you can rather easily use netcat or telnet as a client. However, you aren't limited to storing only readable strings.

It's somewhat of a request-response protocol. A notable exception is the publish-subscribe subset, but it's rather small and I reckon most Redis users don't use it.

It's somewhat of a type-length-value style protocol. Some of the data types include their length in bytes, e.g. bulk strings and verbatim strings. Other types include the number of elements, e.g. arrays and maps. A large number of them have no length at all, e.g. simple strings, integers, and doubles.

I suspect there are good reasons, I gather a lot of it has to do with speed. It does however cause one issue when writing a client: it's not possible to read a whole response without parsing it.
Rewriting sendCmd
With that extra information about the RESP3 protocol the naïve implementation above falls short in a few ways

The read buffer may contain more than one full message and give the definition of decode above any remaining bytes are simply dropped.¹

The read buffer my contain less than one full message and then decode will return an error.²

Surely this must be solvable, because in my mind running the parser results in one of three things:

Parsing is done and the result is returned, together with any input that wasn't consumed.

The parsing is not done due to lack of input, this is typically encoded as a continuation.

The parsing failed so the error is returned, together with input that wasn't consumed.

So, I started looking in the documentation for the module Data.Attoparsec.ByteString.Lazy in attoparsec. I was a little surprised to find that the Result type lacked a way to feed more input to a parser – it only has two constructors, Done and Fail:
data Result r
    = Fail ByteString [String] String
    | Done ByteString r
I'm guessing the idea is that the function producing the lazy bytestring in the first place should be able to produce more chunks of data on demand. That's likely what the lazy variant of recv does, but at the same time it also requires choosing a maximum length and that doesn't rhyme with RESP3. The lazy recv isn't quite lazy in the way I needed it to be.

When looking at the parser for strict bytestrings I calmed down. This parser follows what I've learned about parsers (it's not defined exactly like this; it's parameterised in its input but for the sake of simplicity I show it with ByteString as input):
data Result r
    = Fail ByteString [String] String
    | Partial (ByteString -> Result r)
    | Done ByteString r
Then to my delight I found that there's already a function for handling exactly my problem
parseWith :: Monad m => (m ByteString) -> Parser a -> ByteString -> m (Result a)
I only needed to rewrite the existing parser to work with strict bytestrings and work out how to write a function using recv (for strict bytestrings) that fulfils the requirements to be used as the first argument to parseWith. The first part wasn't very difficult due to the similarity between attoparsec's APIs for lazy and strict bytestrings. The second only had one complication. It turns out recv is blocking, but of course that doesn't work well with parseWith. I wrapped it in timeout based on the idea that timing out means there's no more data and the parser should be given an empty string so it finishes. I also decided to pass the parser as an argument, so I could use the same function for receiving responses for individual commands as well as for pipelines. The full receiving function is
import Data.ByteString qualified as BS
import Data.Text qualified as T
import Network.Socket.ByteString qualified as SB

recvParse :: S.Socket -> Parser r -> IO (Either Text (BS.ByteString, r))
recvParse sock parser = do
    parseWith receive parser BS.empty >>= \case
        Fail _ [] err -> pure $ Left (T.pack err)
        Fail _ ctxs err -> pure $ Left $ T.intercalate " > " (T.pack <$> ctxs) <> ": " <> T.pack err
        Partial _ -> pure $ Left "impossible error"
        Done rem result -> pure $ Right (rem, result)
  where
    receive =
        timeout 100_000 (SB.recv sock 4096) >>= \case
            Nothing -> pure BS.empty
            Just bs -> pure bs
Then I only needed to rewrite sendCmd and I wanted to do it in such a way that any remaining input data could be use in by the next call to sendCmd.³ I settled for modifying the Conn type to hold an IORef ByteString together with the socket and then the function ended up looking like this
sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \(sock, remRef) -> do
    _ <- SBL.send sock $ toWireCmd cmd
    rem <- readIORef remRef
    recvParse sock rem resp >>= \case
        Left err -> pure $ Left $ RespError "recv/parse" err
        Right (newRem, r) -> do
            writeIORef remRef newRem
            pure $ k <$> fromWireResp cmd r
What's next?

I've started looking into pub/sub, and basically all of the work described in this post is a prerequisite for that. It's not very difficult on the protocol level, but I think it's difficult to come up with a design that allows maximal flexibility. I'm not even sure it's worthwhile the complexity.

Footnotes:

¹
This isn't that much of a problem when sticking to the request-response commands, I think. It most certainly becomes a problem with pub/sub though.

²
I'm sure that whatever size of buffer I choose to use there'll be someone out there who's storing values that are larger. Then there's pipelining that makes it even more of an issue.

³
To be honest I'm not totally convinced there'll ever be any remaining input. Unless a single Conn is used by several threads – which would lead to much pain with the current implementation – or pub/sub is used – which isn't supported yet.

Tags: haskell redis
June 28, 2025 10:41 AM

June 27, 2025

Brent Yorgey

Competitive programming in Haskell: prefix sums
Competitive programming in Haskell: prefix sums

Posted on June 27, 2025
Tagged monoid, range, query, prefix, sum, Haskell, competitive programming
In a previous blog post I categorized a number of different techniques for calculating range queries. Today, I will discuss one of those techniques which is simple but frequently useful.
Precomputing prefix sums

Suppose we have a static sequence of values $a_1, a_2, a_3, \dots, a_n$ drawn from some groupThat is, there is an associative binary operation, and every element has an inverse.

, and want to be able to compute the total value (according to the group operation) of any contiguous subrange. That is, given a range $[i,j]$, we want to compute $a_i \diamond a_{i+1} \diamond \dots \diamond a_j$ (where $\diamond$ is the group operation). For example, we might have a sequence of integers and want to compute the sum, or perhaps the bitwise xor (but not the maximum) of all the values in any particular subrange.

Of course, we could simply compute $a_i \diamond \dots \diamond a_j$ directly, but that takes $O(n)$ time. With some simple preprocessing, it’s possible to compute the value of any range in constant time.

The key idea is to precompute an array $P$ of prefix sums, so $P_i = a_1 \diamond \dots \diamond a_i$. This can be computed in linear time via a scan; for example:
import Data.Array
import Data.List (scanl')

prefix :: Monoid a => [a] -> Array Int a
prefix a = listArray (0, length a) $ scanl' (<>) mempty a
⊕Actually, I would typically use an unboxed array, which is faster but slightly more limited in its uses: import Data.Array.Unboxed, use UArray instead of Array, and add an IArray UArray a constraint.

Note that we set $P_0 = 0$ (or whatever the identity element is for the group); this is why I had the sequence of values indexed starting from $1$, so $P_0$ corresponds to the empty sum, $P_1 = a_1$, $P_2 = a_1 \diamond a_2$, and so on.

Now, for the value of the range $[i,j]$, just compute $P_j \diamond P_{i-1}^{-1}$—that is, we start with a prefix that ends at the right place, then cancel or “subtract” the prefix that ends right before the range we want. For example, to find the sum of the integers $a_5 + \dots + a_{10}$, we can compute $P_{10} - P_4$.
range :: Group a => Array Int a -> Int -> Int -> a
range p i j = p!j <> inv (p!(i-1))
That’s why this only works for groups but not for general monoids: only in a group can we cancel unwanted values. So, for example, this works for finding the sum of any range, but not the maximum.
Practice problems

Want to practice? Here are a few problems that can be solved using techniques discussed in this post:

Determining Nucleotide Assortments

Einvígi

Srednji

Veggja Kalli

It is possible to generalize this scheme to 2D—that is, to compute the value of any subrectangle of a 2D grid of values from some group in only $O(1)$ time. I will leave you the fun of figuring out the details.

Prozor

Rust

If you’re looking for an extra challenge, here are a few harder problems which use techniques from this post as an important component, but require some additional nontrivial ingredients:

Killing Chaos

Ozljeda

Vudu
<noscript>Javascript needs to be activated to view comments.</noscript>
by Brent Yorgey at June 27, 2025 12:00 AM

June 25, 2025

Well-Typed.Com

Haskell records in 2025 (Haskell Unfolder #45)

Today, 2025-06-25, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 45th episode of the Haskell Unfolder live on YouTube.

Haskell records in 2025 (Haskell Unfolder #45)

Haskell records as originally designed have had a reputation of being somewhat weird or, at worst, useless. A lot of features and modifications have been proposed over the years to improve the situation. But not all of these got implemented, or widespread adoption. The result is that the situation now is quite different from what it was in the old days, and additional changes are in the works. But the current state can be a bit confusing. Therefore, in this episode, we are going to look at how to make best use of Haskell records right now, discussing extensions such as DuplicateRecordFields, NoFieldSelectors, OverloadedRecordDot and OverloadedRecordUpdate, and we’ll get take a brief look at optics.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at June 25, 2025 12:00 AM

June 24, 2025

Haskell Interlude

66: Daniele Micciancio

Niki and Mike talked to Daniele Micciancio who is a professor at UC San Diego. He's been using Haskell for 20 years, and works in lattice cryptography. We talked to him about how he got into Haskell, using Haskell for teaching theoretical computer science and of course for his research and the role type systems and comonads could play in the design of cryptographic algorithms. Along the way, he gave an accessible introduction to post-quantum cryptography which we really enjoyed. We hope you do, too.

by Haskell Podcast at June 24, 2025 02:00 PM

June 23, 2025

Monday Morning Haskell

The Sliding Window in Haskell & Rust
In last week’s problem, we covered a two-pointer algorithm, and compared Rust and Haskell solutions as we have been for this whole series. Today, we’ll study a related concept, the sliding window problem. Whereas the general two-pointer problem can often be tackled by a single loop, we’ll have to use nested loops in this problem. This problem will also mark our first use of the Set data structure in this series.

If you want a deeper look at problem solving techniques in Haskell, you should enroll in our Solve.hs course! You’ll learn everything you need for general problem solving knowledge in Haskell, including data structures, algorithms, and parsing!

The Problem

Today’s LeetCode problem is Longest Substring without Repeating Characters. It’s a lengthy problem name, but the name basically tells you everything you need to know! We want to find a substring of our input that does not repeat any characters within the substring, and then get the longest such substring.

For example, abaca would give us an answer of 3, since we have the substringbac that consists of 3 unique characters. However, abaaca only gives us 2. There is no run of 3 characters where the three characters are all unique.

The Algorithm

The approach we’ll use, as mentioned above, is called a sliding window algorithm. In some ways, this is similar to the two-pointer approach last week. We’ll have, in a sense, two different pointers within our input. One dictates the “left end” of a window and one dictates the “right end” of a window. Unlike last week’s problem though, both pointers will move in the same direction, rather than converging from opposite directions.

The goal of a sliding window problem is “find a continuous subsequence of an input that matches the criteria”. And for many problems like ours, you want to find the longest such subsequence. The main process for a sliding window problem is this:

Grow the window by increasing the “right end” until (or while) the predicate is satisfied

Once you cannot grow the window any more, shrink the window by increasing the “left end” until we’re in a position to grow the window again.

Continue until one or both pointers go off the end of the input list.

So for our problem today, we want to “grow” our sliding window as long as we can get more unique characters. Once we hit a character we’ve already seen in our current window, we’ll need to shrink the window until that duplicate character is removed from the set.

As we’re doing this, we’ll need to keep track of the largest substring size we’ve seen so far.

Here are the steps we would take with the input abaca. At each step, we process a new input character.
1. Index 0 (‘a’) - window is “a” which is all unique.
2. Index 1 (‘b’) - window is “ab” which is all unique
3. Index 2 (‘a’) - window is “aba”, which is not all unique
3b. Shrink window, removing first ‘a’, so it is now “ba”
4. Index 3 (‘c’) - window is “bac”, which is all unique
5. Index 4 (‘a’) - window is “baca”, which is not unique
5b. Shrink window, remove ‘b’ and ‘a’, leaving “ca”
The largest unique window we saw was bac, so the final answer is 3.

Haskell Solution

For a change of pace, let’s discuss the Haskell approach first. Our algorithm is laid out in such a way that we can process one character at a time. Each character either grows the window, or forces it to shrink to accommodate the character. This means we can use a fold!

Let’s think about what state we need to track within this fold. Naturally, we want to track the current “set” of characters in our window. Each time we see the next character, we have to quickly determine if it’s already in the window. We’ll also want to track the largest set size we’ve seen so far, since by the end of the string our window might no longer reflect the largest subsequence.

With a general sliding window approach, you would also need to track both the start and the end index of your current window. In this problem though, we can get away with just tracking the start index. We can always derive the end index by taking the start index and adding the size of the set. And since we’re iterating through the characters anyway, we don’t need the end index to get the “next” character.

This means our fold-loop function will have this type signature:
-- State: (start index, set of letters, largest seen)
loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
Now, using our idea of “beginning from the end”, we can already write the invocation of this loop:
largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

    loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
    ...
Using 0 for the start index right away is a little hand-wavy, since we haven’t actually added the first character to our set yet! But if we see a single character, we’ll always add it, and as we’ll see, the “adding” branch of our loop never increases this number.

With that in mind, let’s write this branch of our loop handler! If we have not seen the next character in the string, we keep the same start index (left side of the window isn’t moving), we add the character to our set, and we take the new size of the set as the “best” value if it’s greater than the original. We get the new size by adding 1 to the original set size.
largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else ...
Now we reach the tricky case! If we’ve already seen the next character, we need to remove characters from our set until we reach the instance of this character in the set. Since we might need to remove multiple characters, “shrinking” is an iterative process with a variable number of steps. This means it would be a while-loop in most languages, which means we need another recursive function!

The goal of this function is to change two of our stateful values (the start index and the character set) until we can once again have a unique character set with the new input character. So each iteration it takes the existing values for these, and will ultimately return updated values. Here’s its type signature:
shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)
Before we implement this, we can invoke it in our primary loop! When we’ve seen the new character in our set, we shrink the input to match this character, and then return these new stateful values along with our previous best (shrinking never increases the size).
largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else
        let (newStart, newSet) = shrink (startIndex, charSet) c
        in  (newStart, newSet, bestSoFar)

    shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)
    shrink = undefined
Now we implement “shrink” by considering the base case and recursive case. In the base case, the character at this index matches the new character we’ve trying to remove. So we can return the same set of characters, but increase the index.

In the recursive case, we still increase the index, but now we remove the character at the start index from the set without replacement. (Note how we need a vector for efficient indexing here).
largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else
        let (newStart, newSet) = shrink (startIndex, charSet) c
        in  (newStart, newSet, bestSoFar)

    shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)
    shrink (startIndex, charSet) c =
      let nextC = inputV V.! startIndex
          // Base Case: nextC is equal to newC
      in  if nextC == c then (startIndex + 1, charSet)
            // Recursive Case: Remove startIndex
            else shrink (startIndex + 1, S.delete nextC charSet) c
Now we have a complete Haskell solution!

Rust Solution

Now in our Rust solution, we’ll follow the same pattern we’ve been doing for these problems. We’ll set up our loop variables, write the loop, and handle the different cases in the loop. Because we had the nested recursive “shrink” function in Haskell, this will translate to a “while” loop in Rust, nested within our for-loop.

Here’s how we set up our loop variables:
pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        ...
    }
}
Within the loop, we have the “easy” case, where the next character is not already in our set. We just insert it into our set, and we update best if we have a new maximum.
pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        if charSet.contains(&c) {
            ...
        } else {
            charSet.insert(c);
            best = std::cmp::max(best, charSet.len());                
        }
    }
    return best as i32;
}
The Rust-specific oddity is that when we call contains on the HashSet, we must use &c, passing a reference to the character. In C++ we could just copy the character, or it could be handled by the function using const&. But Rust handles these things a little differently.

Now we get to the “tricky” case within our loop. How do we “shrink” our set to consume a new character?

In our case, we’ll actually just use the loop functionality of Rust, which works like while (true), requiring a manual break inside the loop. Our idea is that we’ll inspect the character at the “start” index of our window. If this character is the same as the new character, we will advance the start index (indicating we are dropping the old version), but then we’ll break. Otherwise, we’ll still increase the index, but we’ll remove the other character from the set as well.

Here’s what this loop looks like in relative isolation:
if charSet.contains(&c) {
    loop {
        // Look at “first” character of window
        let nextC = inputV[startIndex];
        if (nextC == c) {
            // If it’s the new character, we advance past it and break
            startIndex += 1;
            break;
        } else {
            // Otherwise, advance AND drop it from the set
            startIndex += 1;
            charSet.remove(&nextC);
        }
    }
} else {
    ...
}
The inner condition (nextC == c) feels a little flimsy to use with a while (true) loop. But it’s perfectly sound because of the invariant that if charSet contains c, we’ll necessarily find nextC == c before startIndex gets too large. We could also write it as a normal while loop, but loop is an interesting Rust-specific idea to bring in here.

Here’s our complete Rust solution!
pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        if charSet.contains(&c) {
            loop {
                let nextC = inputV[startIndex];
                if (nextC == c) {
                    startIndex += 1;
                    break;
                } else {
                    startIndex += 1;
                    charSet.remove(&nextC);
                }
            }
        } else {
            charSet.insert(c);
            best = std::cmp::max(best, charSet.len());                
        }
    }
    return best as i32;
}
Conclusion

With today’s problem, we’ve covered another important problem-solving concept: the sliding window. We saw how this approach could work even with a fold in Haskell, considering one character at a time. We also saw how nested loops compare across Haskell and Rust.

For more problem solving tips and tricks, take a look at Solve.hs, our complete course on problem solving, data structures, and algorithms in Haskell. You’ll get tons of practice on problems like these so you can significantly level up your skills!
by James Bowen at June 23, 2025 08:30 AM

Brent Yorgey

Competitive programming in Haskell: range queries, classified

Competitive programming in Haskell: range queries, classified

Posted on June 23, 2025
Tagged semigroup, monoid, range, query, Haskell, competitive programming

Static range queries

Suppose we have a sequence of values, which is static in the sense that the values in the sequence will never change, and we want to perform range queries, that is, for various ranges we want to compute the total of all consecutive values in the range, according to some binary combining operation. For example, we might want to compute the maximum, sum, or product of all the consecutive values in a certain subrange. We have various options depending on the kind of ranges we want and the algebraic properties of the operation.

If we want ranges corresponding to a sliding window, we can use an amortized queue structure to find the total of each range in $O(1)$, for an arbitrary monoid.

If we want arbitrary ranges but the operation is a group, the solution is relatively straightforward: we can precompute all prefix sums, and subtract to find the result for an arbitrary range in $O(1)$.

If the operation is an idempotent semigroup (that is, it has the property that $x \diamond x = x$ for all $x$), we can use a sparse table, which takes $O(n \lg n)$ time and space for precomputation, and then allows us to answer arbitrary range queries in $O(1)$. I also plan to write about this in an upcoming blog post.

If the operation is an arbitrary monoid, we can use a sqrt tree, which uses $O(n \lg \lg n)$ precomputed time and space, and allows answering arbitrary range queries in $O(\lg \lg n)$. I will write about this in a future post.

Dynamic range queries

What if we want dynamic range queries, that is, we want to be able to interleave range queries with arbitrary updates to the values of the sequence?

If the operation is an arbitrary monoid, we can use a segment tree.

If the operation is a group, we can use a Fenwick tree.

I published a paper about Fenwick trees, which also discusses segment trees, but I should write more about them here!

Table

Hereâ€™s a table summarizing the above classification scheme. I plan to fill in links as I write blog posts about each row.

Sequence Ranges Operation Solution Precomputation Queries

Static Sliding window Monoid Amortized queue $O(1)$ $O(1)$

Static Arbitrary Group Prefix sum table $O(n)$ $O(1)$

Static Arbitrary Idempotent semigroup Sparse table $O(n \lg n)$ $O(1)$

Static Arbitrary Monoid Sqrt tree $O(n \lg \lg n)$ $O(\lg \lg n)$

Dynamic Arbitrary Group Fenwick tree $O(n)$ $O(\lg n)$

Dynamic Arbitrary Monoid Segment tree $O(n)$ $O(\lg n)$

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 23, 2025 12:00 AM

Sequence	Ranges	Operation	Solution	Precomputation	Queries
Static	Sliding window	Monoid	Amortized queue	\(O(1)\)	\(O(1)\)
Static	Arbitrary	Group	Prefix sum table	\(O(n)\)	\(O(1)\)
Static	Arbitrary	Idempotent semigroup	Sparse table	\(O(n \lg n)\)	\(O(1)\)
Static	Arbitrary	Monoid	Sqrt tree	\(O(n \lg \lg n)\)	\(O(\lg \lg n)\)
Dynamic	Arbitrary	Group	Fenwick tree	\(O(n)\)	\(O(\lg n)\)
Dynamic	Arbitrary	Monoid	Segment tree	\(O(n)\)	\(O(\lg n)\)

June 22, 2025

Philip Wadler

How to market Haskell to a mainstream programmer

An intriguing talk by Gabriella Gonzalez, delivered at Haskell Love 2020. Based largely on the famous marketing book, Crossing the Chasm. Gonzalez argues that marketing is not about hype, it is about setting priorities: what features and markets are you going to ignore? The key to adoption is to be able to solve a problem that people need solved today and where existing mainstream tools are inadequate. Joe Armstrong will tell you that the key to getting Erlang used was to approach failing projects and ask "Would you like us to build you a prototype?" Gonzalez makes a strong case that Haskell should first aim to capture the interpreters market. He points out that the finance/blockchain market may be another possibility. Recommended to me at Lambda Days by Pedro Abreu, host of the Type Theory Forall podcast.

by Philip Wadler ([email protected]) at June 22, 2025 07:07 PM

What is happening in Gaza is an injury to our collective conscience. We must be allowed to speak out

A powerful op-ed by Gabor Maté in the Toronto Star.
Just as nothing justifies the atrocities of October 7, nothing about October 7 justifies Israeli atrocities against the Palestinians, either before or since October 7. Recently, I listened to orthopedic surgeon Dr. Deirdre Nunan, like me a graduate of UBC’s Faculty of Medicine, recount her harrowing experiences serving in a Gaza hospital under the siege that followed Israel’s breaking of the ceasefire in March. Her depictions of unspeakable horror, enacted as policy by one of the world’s most sophisticated militaries, were soul shattering. Many other physicians — Canadian, American, Jewish, Muslim, Christian — who have worked in Gaza speak in similar terms. British doctors describe witnessing “a slaughterhouse.” All their testimonies are widely accessible. The leading medical journal Lancet editorialized that in its assault on health care facilities and personnel in Gaza, “the Israeli Government has acted with impunity … Many medical academies and health professional organizations that claim a commitment to social justice have failed to speak out.” ...
It may be true that antisemitic animus can lurk behind critiques of Zionism. But in my decades of advocacy for Palestinian rights including medical visits to Gaza and the West Bank, I have rarely witnessed it. When present, it has a certain tone that one can feel is directed at Jewishness itself, rather than at the theory and practice of Zionism or at Israel’s actions. What is far more common and genuinely confusing for many is that Israel and its supporters, Jews and non-Jews, habitually confound opposition to Israeli policy with antisemitism. This is akin to Vietnam War protesters being accused of anti-Americanism. How is opposing the napalming of human beings anti-American or, say, deploring Israel’s use of mass starvation as a weapon of war in any sense anti-Jewish? ...
People deserve the right to experience as much liberty to publicly mourn, question, oppose, deplore, denounce what they perceive as the perpetration of injustice and inhumanity as they are, in this country, to advocate for the aims and actions of the Israeli government and its Canadian abettors amongst our political leadership, academia, and media.
Even if we feel powerless to stop the first genocide we have ever watched on our screens in real time, allow at least our hearts to be broken openly, as mine is. And more, let us be free to take democratic, non-hateful action without fear of incurring the calumny of racism.
Thanks to a colleague in the Scottish Universities Jewish Staff Network for bringing it to my attention.

by Philip Wadler ([email protected]) at June 22, 2025 05:03 PM

The Provocateurs: Brave New Bullshit

[Reposting with update.]

Following two sell-out shows at the Fringe last year, I'm on at the Fringe again:

11.25 Monday 4 August, Stand 2 w/Lucy Remnant and Susan Morrison
17.40 Sunday 17 August, Stand 4 w/Smita Kheria and Sarah-Jane Judge
17.40 Tuesday 19 August, Stand 4 w/Cameron Wyatt and Susan Morrison

Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:

Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.

Here is a longer blurb, from my upcoming appearance at Curious, run by the RSE, in September.
Brave New Bullshit
In an AI era, who wins and who loses?

Your future workday might look like this:
You write bullet points.
You ask a chatbot to expand them into a report.
You send it to your boss ...
Who asks a chatbot to summarise it to bullet points.
Will AI help you to do your job or take it from you? Is it fair for AI to be trained on copyrighted material? Will any productivity gains benefit everyone or only a select few?

Join Professor Philip Wadlerâ€™s talk as he looks at the hopes and threats of AI, exploring who wins and who loses.

by Philip Wadler ([email protected]) at June 22, 2025 04:40 PM

June 20, 2025

Magnus Therning

Finding a type for Redis commands
Arriving at a type for Redis commands required a bit of exploration. I had some ideas early on that I for various reasons ended up dropping on the way. This is a post about my travels, hopefully someone finds it worthwhile reading.
The protocol
The Redis Serialization Protocol (RESP) initially reminded me of JSON and I thought that following the pattern of aeson might be a good idea. I decided up-front that I'd only support the latest version of RESP, i.e. version 3. So, I thought of a data type, Resp with a constructor for each RESP3 data type, and a pair of type classes, FromResp and ToResp for converting between Haskell types and RESP3. Then after some more reflection I realised that converting to RESP is largely pointless. The main reason to convert anything to RESP3 is to assemble a command, with its arguments, to send to Redis, but all commands are arrays of bulk strings so it's unlikely that anyone will actually use ToResp.¹ So I scrapped the idea of ToResp. FromResp looked like this
class FromResp a where
    fromResp :: Value -> Either FromRespError a
When I started defining commands I didn't like the number of ByteString arguments that resulted in, so I defined a data type, Arg, and an accompanying type class for arguments, ToArg:
newtype Arg = Arg {unArg :: [ByteString]}
    deriving (Show, Semigroup, Monoid)

class ToArg a where
    toArg :: a -> Arg
Later on I saw that it might also be nice to have a type class specifically for keys, ToKey, though that's a wrapper for a single ByteString.

Implementing the functions to encode/decode the protocol were straight-forward applications of attoparsec and bytestring (using its Builder).
A command is a function in need of a sender
Even though supporting pipelining was one of the goals I felt a need to make sure I'd understood the protocol so I started off with single commands. The protocol is a simple request/response protocol at the core so I settled on this type for commands
type Cmd a = forall m. (Monad m) => (ByteString -> m ByteString) -> m (Either FromRespError a)
that is, a command is a function accepting a sender and returning an a.

I wrote a helper function for defining commands, sendCmd
sendCmd :: (Monad m, FromResp a) => [ByteString] -> (ByteString -> m ByteString) -> m (Either FromRespError a)
sendCmd cmdArgs send = do
    let cmd = encode $ Array $ map BulkString cmdArgs
    send cmd <&> decode >>= \case
        Left desc -> pure $ Left $ FromRespError "Decode" (Text.pack desc)
        Right v -> pure $ fromValue v
which made it easy to define commands. Here are two examples, append and mget:
append :: (ToArg a, ToArg b) => a -> b -> Cmd Int
append key val = sendCmd $ ["APPEND"] <> unArg (toArg key <> toArg val)

-- | https://redis.io/docs/latest/commands/mget/
mget :: (ToArg a, FromResp b) => NE.NonEmpty a -> Cmd (NE.NonEmpty b)
mget ks = sendCmd $ ["MGET"] <> unArg (foldMap1 toArg ks)
The function to send off a command and receive its response, sendAndRecieve, was just a call to send followed by a call to recv in network (the variants for lazy bytestrings).

I sort of liked this representation – there's always something pleasant with finding a way to represent something as a function. There's a very big problem with it though: it's difficult to implement pipelining!

Yes, Cmd is a functor since (->) r is a functor, and thus it's possible to make it an Applicative, e.g. using free. However, to implement pipelining it's necessary to

encode all commands, then

concatenate them all into a single bytestring and send it

read the response, which is a concatenation of the individual commands' responses, and

convert each separate response from RESP3.

That isn't easy when each command contains its own encoding and decoding. The sender function would have to relinquish control after encoding the command, and resume with the resume again later to decode it. I suspect it's doable using continuations, or monad-coroutine, but it felt complicated and rather than travelling down that road I asked for ideas on the Haskell Discourse. The replies lead me to a paper, Free delivery, and a bit later a package, monad-batcher. When I got the pointer to the package I'd already read the paper and started implementing the ideas in it, so I decided to save exploring monad-batcher for later.
A command for free delivery
The paper Free delivery is a perfect match for pipelining in Redis, and my understanding is that it proposes a solution where

Commands are defined as a GADT, Command a.

Two functions are defined to serialise and deserialise a Command a. In the paper they use String as the serialisation, so show and read is used.

A type, ActionA a, is defined that combines a command with a modification of its a result. It implements Functor.

A free type, FreeA f a is defined, and made into an Applicative with the constraint that f is a Functor.

A function, serializeA, is defined that traverses a FreeA ActionA a serialising each command.

A function, deserializeA, is defined that traverses a FreeA ActionA a deserialising the response for each command.

I defined a command type, Command a, with only three commands in it, echo, hello, and ping. I then followed the recipe above to verify that I could get it working at all. The Haskell used in the paper is showing its age, and there seems to be a Functor instance missing, but it was still straight forward and I could verify that it worked against a locally running Redis.

Then I made a few changes…

I renamed the command type to Cmd so I could use Command for what the paper calls ActionA.
data Cmd r where
    Echo :: Text -> Cmd Text
    Hello :: Maybe Int -> Cmd ()
    Ping :: Maybe Text -> Cmd Text

data Command a = forall r. Command !(r -> a) !(Cmd r)

instance Functor Command where
    fmap f (Command k c) = Command (f . k) c

toWireCmd :: Cmd r -> ByteString
toWireCmd (Echo msg) = _
toWireCmd (Hello ver) = _
toWireCmd (Ping msg) = _

fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp (Echo _) = fromResp
fromWireResp (Hello _) = fromResp
fromWireResp (Ping _) = fromResp
(At this point I was still using FromResp.)

I also replaced the free applicative defined in the paper and started using free. A couple of type aliases make it a little easier to write nice signatures
type Pipeline a = Ap Command a

type PipelineResult a = Validation [RespError] a
and defining individual pipeline commands turned into something rather mechanical. (I also swapped the order of the arguments to build a Command so I can use point-free style here.)
liftPipe :: (FromResp r) => Cmd r -> Pipeline r
liftPipe = liftAp . Command id

echo :: Text -> Pipeline Text
echo = liftPipe . Echo

hello :: Maybe Int -> Pipeline ()
hello = liftPipe . Hello

ping :: Maybe Text -> Pipeline Text
ping = liftPipe . Ping
One nice thing with switching to free was that serialisation became very simple
toWirePipeline :: Pipeline a -> ByteString
toWirePipeline = runAp_ $ \(Command _ c) -> toWireCmd c
On the other hand deserialisation became a little more involved, but it's not too bad
fromWirePipelineResp :: Pipeline a -> [Resp] -> PipelineResult a
fromWirePipelineResp (Pure a) _ = pure a
fromWirePipelineResp (Ap (Command k c) p) (r : rs) = fromWirePipelineResp p rs <*> (k <$> liftError singleton (fromWireResp c r))
fromWirePipelineResp _ _ = Failure [RespError "fromWirePipelineResp" "Unexpected wire result"]
Everything was working nicely and I started adding support for more commands. I used the small service from work to guide my choice of what commands to add. First out was del, then get and set. After adding lpush I was pretty much ready to try to replace hedis in the service from work.
data Cmd r where
    -- echo, hello, ping
    Del :: (ToKey k) => NonEmpty k -> Cmd Int
    Get :: (ToKey k, FromResp r) => k -> Cmd r
    Set :: (ToKey k, ToArg v) => k -> v -> Cmd Bool
    Lpush :: (ToKey k, ToArg v) => k -> NonEmpty v -> Cmd Int
However, when looking at the above definition started I thinking.

Was it really a good idea to litter Cmd with constraints like that?

Would it make sense to keep the Cmd type a bit closer to the actual Redis commands?

Also, maybe FromResp wasn't such a good idea after all, what if I remove it?

That brought me to the third version of the type for Redis commands.
Converging and simplifying
While adding new commands and writing instances of FromResp I slowly realised that my initial thinking of RESP3 as somewhat similar to JSON didn't really pan out. I had quickly dropped ToResp and now the instances of FromResp didn't sit right with me. They obviously had to "follow the commands", so to speak, but at the same time allow users to bring their own types. For instance, LSPUSH returns the number of pushed messages, but at the same time GET should be able to return an Int too. This led to Int's FromResp looking like this
instance FromResp Int where
    fromResp (BulkString bs) =
        case parseOnly (AC8.signed AC8.decimal) bs of
            Left s -> Left $ RespError "FromResp" (TL.pack s)
            Right n -> Right n
    fromResp (Number n) = Right $ fromEnum n
    fromResp _ = Left $ RespError "FromResp" "Unexpected value"
I could see this becoming worse, take the instance for Bool, I'd have to consider that

for MOVE Integer 1 means True and Integer 0 means False

for SET SimpleString "OK" means True

users would justifiably expect a bunch of bytestrings to be True, e.g. BulkString "true", BulkString "TRUE", BulkString "1", etc

However, it's impossible to cover all ways users can encode a Bool in a ByteString so no matter what I do users will end up having to wrap their Bool with newtype and implement a fitting FromResp. On top of that, even thought I haven't found any example of it yet, I fully expect there to be, somewhere in the large set of Redis commands, at least two commands each wanting an instance of a basic type that simply can't be combined into a single instance, meaning that the client library would need to do some newtype wrapping too.

No, I really didn't like it! So, could I get rid of FromResp and still offer users an API where they can user their own types as the result of commands?

To be concrete I wanted this
data Cmd r where
    -- other commands
    Get :: (ToKey k) => k -> Cmd (Maybe ByteString)
and I wanted the user to be able to conveniently turn a Cmd r into a Cmd s. In other words, I wanted a Functor instance. Making Cmd itself a functor isn't necessary and I just happened to already have a functor type that wraps Cmd, the Command type I used for pipelining. If I were to use that I'd need to write wrapper functions for each command though, but if I did that then I could also remove the ToKey~/~ToArg constraints from the constructors of Cmd r and put them on the wrapper instead. I'd get
data Cmd r where
    -- other commands
    Get :: Key -> Cmd (Maybe ByteString)

get :: (ToKey k) => k -> Command (Maybe ByteString)
get = Command id . Get . toKey
I'd also have to rewrite fromWireResp so it's more specific for each command. Instead of
fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp (Get _) = fromResp
...
I had to match up exactly on the possible replies to GET
fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp _ (SimpleError err desc) = Left $ RespError (T.decodeUtf8 err) (T.decodeUtf8 desc)
fromWireResp (Get _) (BulkString bs) = Right $ Just bs
fromWireResp (Get _) Null = Right Nothing
...
fromWireResp _ _ = Left $ RespError "fromWireResp" "Unexpected value"
Even though it was more code I liked it better than before, and I think it's slightly simpler code. I also hope it makes the use of the API is a bit simpler and clear.

Here's an example from the code for the service I wrote for work. It reads a UTC timestamp stored in timeKey, the timestamp is a JSON string so it needs to be decoded.
readUTCTime :: Connection -> IO (Maybe UTCTime)
readUTCTime conn =
    sendCmd conn (maybe Nothing decode <$> get timeKey) >>= \case
        Left _ -> pure Nothing
        Right datum -> pure datum
What's next?

I'm pretty happy with the command type for now, though I have a feeling I'll have to revisit Arg and ToArg at some point.

I've just turned the Connection type into a pool using resource-pool, and I started looking at pub/sub. The latter thing, pub/sub, will require some thought and experimentation I think. Quite possibly it'll end up in a post here too.

I also have a lot of commands to add.

Footnotes:

¹
Of course one could use RESP3 as the serialisation format for storing values in Redis. Personally I think I'd prefer using something more widely used, and easier to read, such as JSON or BSON.

Tags: haskell redis
June 20, 2025 09:40 PM

Well-Typed.Com

GHC activities report: Marchâ€“May 2025
This is the twenty-seventh edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of March 2025 to May 2025. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem including through the work described in this report. To find out more, read our recent announcement of these packages in partnership with the Haskell Foundation. We need funding to continue this essential maintenance work!

Many thanks to our Haskell Ecosystem Supporters: Channable and QBayLogic; to our existing clients who also contribute to making this work possible: Anduril, Juspay and Mercury; and to the HLS Open Collective for supporting HLS release management.

Team

The Haskell toolchain team at Well-Typed currently includes:

Andreas Klebinger

Ben Gamari

Matthew Pickering

Rodrigo Mesquita

Sam Derbyshire

Hannes Siebenhandl

Zubin Duggal

Mikolaj Konarski

In addition, many others within Well-Typed contribute to GHC, Cabal and HLS occasionally, or contribute to other open source Haskell libraries and tools.

GHC

Highlights

Explicit level imports

Following on from our best paper prize at TFP 2025, Matthew implemented Explicit Level Imports (GHC proposal #682, !14241).

This feature allows one to specify whether imports are needed for running Template Haskell splices, or for generating Template Haskell quotes. This cleanly separates which modules are required at compile-time vs those that are required at runtime. For example, the pandoc package uses the Template Haskell deriveJSON function from the aeson package. This function can be imported using a splice import:
{-# LANGUAGE ExplicitLevelImports #-}
{-# LANGUAGE TemplateHaskell #-}
module Text.Pandoc.App.Opt where
import splice Data.Aeson.TH (deriveJSON, defaultOptions)
-- + many other non-splice imports

data XYZ = ...
$(deriveJSON defaultOptions ''XYZ)
Declaring the Data.Aeson.TH import as a splice import informs GHC that this module is required only at compile-time, and (crucially) that other, non-splice, imports, are not needed at compile time. This hugely improves the performance of tools that use -fno-code (such as HLS), as GHC is no longer required to pessimistically assume that all modules imported in a module enabling TemplateHaskell are required at compile-time.

GHCi support for primops

Andreas significantly improved GHCi performance by implementing certain GHC primops (such as integer arithmetic operations) directly in the bytecode interpreter (!13978).

Reductions in runtime of up to 50% have been observed, with GHC-in-GHCi speeding up by about 15%.

Improvements to the debugger

Rodrigo has made numerous improvements to the GHCi debugger, which had accumulated many bugs over the years due to lack of maintenance (!14246, !14195, !14160, !14106, !14196, !14195, !13997). Usability is improved across the board, with quality-of-life fixed such as adding breakpoints to all statements in a do block to make debugging more predictable (#25932) to significant performance improvements to :steplocal (#25779).

Rodrigo also published the ghc-debugger package including an executable ghc-debug-adapter. This implements the Debug Adapter Protocol, enabling Haskell programs to be stepped-through and debugged from editors such as Visual Studio Code. ghc-debug-adapter depends on many recent changes to GHC, so it is compatible only with the upcoming GHC 9.14.

Expressions in SPECIALISE pragmas

Sam worked with Simon Peyton Jones to finalise MR !12319 “Expressions in SPECIALISE pragmas”. This change means that a SPECIALISE pragma is no longer required to simply be a type signature, it can be an arbitrary expression. For full details, see GHC proposal #493, but two particular idioms are worth noting. Firstly, the type at which to specialise can now be specified by a type application, e.g.
myFunction :: forall a. Num a => a -> Maybe a -> (a, a)
myFunction = ...
{-# SPECIALISE myFunction @Int #-}
This specialise pragma is much more concise than:
{-# SPECIALISE :: Int -> Maybe Int -> (Int, Int) #-}
and less prone to breakage when the type of myFunction changes.

Secondly, the syntax enables value specialisation, for example:
mainFunction :: Bool -> ...
mainFunction debug = if debug then ... else ...
{-# SPECIALISE mainFunction False #-}
This tells GHC to optimise the non-debug code path, without the debug logic potentially getting in the way.

Multiple Home Units support in GHCi

GHC 9.14 is fully compatible with multiple home units, including all GHCi commands and the GHCi debugger, thanks to work by Hannes about which we recently published a blog post (!14231). Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way. The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.

GHC Releases

Ben released GHC 9.12.2 in March.

Thanks to Luite Stegeman from IOG who released GHC 9.6.7 in March.

Zubin released GHC 9.10.2 in May.

The team are now working towards the release of GHC 9.14.1 later this year.

After various community discussions, GHC HQ are planning to start designating some major release series as Long Term Support. This means increasing the length of the support window for LTS releases and reducing it for non-LTS releases.

Frontend

Sam fixed a regression in the implementation of QuickLook in GHC 9.12 that would cause valid programs to be rejected (#26030, #25950, !14235).

Sam fixed a problem in which HasCallStack evidence was incorrectly cached in GHC, causing GHC to bogusly report identical call stacks (#25529, !14084).

Sam rectified several oversights in the initial implementation of the NamedDefaults language extension laid out in GHC proposal #409:

an issue with exporting named defaults (#25857, !14142),

lack of support for named default declarations for poly-kinded typeclasses such as Typeable (#25882, !14143),

an oversight in which NamedDefaults changed the behaviour of existing programs (#25775, !14075, ghc-proposals#694).

Sam fixed duplicate record fields sometimes being reported as unused when they are actually used (#24035, !14066).

Sam improved the error message emitted by GHC when one attempts to write a non-class at the head of a typeclass instance (#22688, !14105).

Sam fixed several issues with the renaming of export lists:

one issue involved the TypeData extension (#24027, !14119),

another was to do with bundled pattern synonyms (#25892, !14154).

Sam made “illegal term-level use” error messages more user friendly (#23982, !14122). That MR also improved the way GHC reports name qualification to the user, preferring to display the user-written qualification in error messages.

Sam fixed GHC creating unnecessary cycle-breaker variables, which could cause problems for type-checking plugins that weren’t expecting them (#25933, !14206).

Sam implemented the deprecation described in GHC proposal #448: the combination of ScopedTypeVariables and TypeApplications no longer enables the use of type applications in constructor patterns, requiring instead the TypeAbstractions extension (!13551).

Sam fixed an issue in which equal types compared non-equal under TypeRep-equality by implementing a suggestion by Krzysztof Gogolewski (#25998, !14281).

Sam improved the documentation surrounding defaulting in the user’s guide, providing a high-level overview of the different mechanisms in GHC for defaulting ambiguous type variables (#25807, !14057).

Backend

Ben and Sam investigated testsuite failures in the LLVM backend (#25769). They identified many different issues:

#25730 concerned incorrect type annotations in the generated LLVM, fixed in !13936.

#25770, #25773 were symptoms of a serious bug in the implementation of floating-point register padding (fixed in !14134),

!14129 fixed incorrect type annotations in the LLVM for atomic operations, adding new tests to Cmm Lint to avoid similar bugs in the future.

Most of the other bugs involved initializers/finalizers, which were due to incorrect linkage annotation for builtin arrays (fixed in !14157).

Rodrigo worked with Simon Peyton Jones to fix an issue in which the presence or absence of unrelated RULES could affect compilation, leading to non-deterministic compilation (#25170, !13884).

Andreas fixed a bug in which GHC would construct over-saturated constructor applications, which caused a panic when building the xmonad-contrib package (#23865, !14036).

Andreas made GHC constant-fold away invalid tagToEnum# calls to a particular error expression, which unlocks dead-code elimination opportunities and makes it easier to debug issues that arise from invalid use of tagToEnum# (#25976, !14254)

Andreas added -fhuge-code-sections, an off-by-default flag that provides a workaround for AArch64 users running into bug #24648.

Matthew overhauled the driver to bring one-shot compilation and make mode in line with each other, by consistently using the module graph to answer queries related to the module import structure (!14198, !14209). This was partly motivated by implementation requirements of the “Explicit Splice Imports” proposal, for which module graph queries are a central component.

Matthew added support for “fixed” nodes in the module graph, which can be used for modules without corresponding source-files that are e.g. generated via the GHC API (#25920, !14187).

Rodrigo moved some DynFlags consistency checks in order to consolidate the logic into the core makeDynFlagsConsistent function.

Ben changed how GHC prints Uniques to the user to avoid NULL characters (#25989, !14265).

Compiler performance

Matthew improved the performance of the bytecode assembler by ensuring the code is properly specialised (!13983).

Matthew made sure that forceModIface properly forced all fields of ModIface in order to avoid space leaks (!14078).

Matthew removed unused mi_used_th and mi_hpc fields from interfaces, which were needlessly bloating interface files (!14073).

Matthew avoided allocation of intermediate ByteStrings when serialising FastStrings (#25861, !14107).

Recompilation checking

Matthew overhauled the ModIface datatype, splitting it up in a more logical way which makes it easier to identify which parts contribute to recompilation checking (!14102). This allowed fixing several issues with recompilation checking in !14118, such as:

it ignored changes in exported named default declarations (#25855),

it did not take into account changes to COMPLETE pragmas (#25854).

Matthew added the -fwrite-if-self-recomp flag which controls whether to include self-recompilation information, which avoids writing recompilation information in cases such as producing binary distributions for which recompilation is not a concern (#10424, #22188, !8604).

Matthew refactored the implementation of recompilation-checking to ensure that all flags that influence recompilations are correctly taken into account (#25837, !14085).

Sam improved recompilation checking for export lists in !14178 (#25881). In practice, this means that modules with explicit import lists will no longer always trigger the recompilation of a module they depend on when that module’s export list changes, as long as the explicitly imported items are preserved.

Matthew improved the output of -dump-hi-diff to properly display the precise change in flags which caused recompilation (#25571, !13792).

Runtime system

Ben fixed a bug in which the WinIO I/O manager was being inconsistently selected (#25838, !14088).

Ben diagnosed and fixed a linking issue affecting global offset table usage on macOS that manifested in incorrect runtime results when using the GHC API (#25577, !13991).

Ben fixed an issue in which GHC’s RTS linker was too eager to load shared objects which refer to undefined symbols (#25943, !14290).

Ben significantly improved the performance of the RTS linker, culminating in a reduction in GHCi startup time from 2.5s to 250ms on Windows (#26052, #26009, !14339).

GHCi & bytecode interpreter

Andreas fixed several endianness issues in the interpreter (#25791, !14172).

Matthew implemented a fix for the mishandling of stack underflow frames (#25750, !13957). A remaining issue was subsequently identified (#25865) and fixed by Andreas’ work on the interpreter (!13978).

Matthew ensured that all top-level functions are visible when loading a module in the interpreter, not only exported functions (!14032).

Matthew fixed a bug in the simplifier that caused Core Lint failures when compiling certain programs (#25790, !14019).

Matthew fixed a regression in the way that GHCi would import modules that involved Cabal mixins stanzas (#25951, !14222).

Libraries

Ben exposed the constructors and fields of the Backtrace datatype in base (#26049, !14351).

Ben brought base changelog entries up to date in !14320.

Build system & packaging

Sam fixed GHC not working properly if the installation path contains spaces on Windows (#25204, !14137).

Ben fixed a couple of issues relating to the llvm-as flag:

the value of the field was incorrectly set (#25856, !14104),

the information in the field was passed incorrectly to clang (#25793, !14025).

Testsuite

Andreas fixed a bug in which tests requiring the interpreter would be run even if the compiler didn’t support it (#25533, !14201).

Matthew fixed an issue with tests that used Template Haskell in the profiled dynamic way (#25947, !14215).

Cabal

Mikolaj prepared the 3.14.2.0 bugfix release to the Cabal package suite (including the Cabal library and cabal-install).

Matthew fixed all known regressions in the 3.14.1.0 release of cabal-install:

Issue #10759 to do with picking up unwanted environment files #10828.

Duplication of environment variables (#10718, #10827).

Interaction of multi-repl with internal dependencies (#10775, #10841).

A working directory oversight (#10772, #10800).

The pkgname_datadir environment variable incorrectly using a relative path (#10717, #10830).

Matthew updated the outdated and gen-bounds commands to work with the v2- project infrastructure (#10878, #10840).

Matthew ensured that C++ environment variables are passed to configure scripts (#10797, #10844).

Matthew added a module name validity check to the cabal check command (#10295, #10816).

Matthew updated the Cabal CI to use GHC 9.12.2 and GHC 9.6.7 (#10893).

Matthew improved the testsuite output to make it more readable (#8419, #10837).

Matthew fixed an issue in which changes to the PATH environment variable would incorrectly not trigger recompilation (#2015, #10817).

HLS

Hannes prepared the HLS release 2.10.0.0 (#4448)

Zubin prepared the HLS release 2.11.0.0 (#4585)

Zubin added support for GHC 9.12.2 in HLS (#4527)

Zubin reworked the HLS release CI infrastructure (#4481)

Haskell.org infrastructure

Ben worked to refactor and migrate a variety of core haskell.org services from Equinix Metal to new infrastructure at OpenCape:

hoogle.haskell.org has been Nixified and now periodically reindexes automatically.

Haskell.org’s primary mail server, mail.haskell.org, has been Nixified and updated.

Haskell.org’s many mailing lists have been migrated to Mailman 3

gitlab.haskell.org has been migrated to OpenCape and updated

The Hackage documentation builder has been completely revamped with a more maintainable deployment strategy and a broader set of native packages available, enabling more Hackage packages to benefit from automatically-built documentation.

With these maintainability improvements we hope that haskell.org’s core infrastructure team can be more easily grown in the future.
by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at June 20, 2025 12:00 AM

June 17, 2025

Magnus Therning

Why I'm writing a Redis client package

A couple of weeks ago I needed a small, hopefully temporary, service at work. It bridges a gap in functionality provided by a legacy system and the functionality desired by a new system. The legacy system is cumbersome to work with, so we tend to prefer building anti-corruption layers rather than changing it directly, and sometimes we implement it as separate services.

This time it was good enough to run the service as a cronjob, but it did need to keep track of when it ran the last time. It felt silly to spin up a separate DB just to keep a timestamp, and using another service's DB is something I really dislike and avoid.¹ So, I ended up using the Redis instance that's used as a cache by a OSS service we host.

The last time I had a look at the options for writing a Redis client in Haskell I found two candidates, hedis and redis-io. At the time I wrote a short note about them. This time around I found nothing much has changed, they are still the only two contenders and they still suffer from the same issues

hedis has still has the same API and I still find it as awkward.

redis-io still requires a logger.

I once again decided to use hedis and wrote the service for work in a couple of days, but this time I thought I'd see what it would take to remove the requirement on tinylog from redis-io. I spent a few evenings on it, though I spent most time on "modernising" the dev setup, using Nix to build, re-format using fourmolu, etc. I did the same for redis-resp, the main dependency of redis-io. The result of that can be found on my gitlab account:

https://gitlab.com/magus/redis-resp

https://gitlab.com/magus/redis-io

At the moment I won't take that particular experiment any further and given that the most recent change to redis-io was in 2020 (according to its git repo) I don't think there's much interest upstream either.

Making the changes to redis-io and redis-resp made me a little curious about the Redis protocol so I started reading about it. It made me start thinking about implementing a client lib myself. How hard could it be?

I'd also asked a question about Redis client libs on r/haskell and a response led me to redis-schema. It has a very good README, and its section on transactions with its observation that Redis transactions are a perfect match for Applicative. This pushed me even closer to start writing a client lib. What pushed me over the edge was the realisation that pipelining also is a perfect match for Applicative.

For the last few weeks I've spent some of my free time reading and experimenting and I'm enjoying it very much. We'll see where it leads, but hopefully I'll at least have bit more to write about it.

Footnotes:

¹
One definition of a microservice I find very useful is "a service that owns its own DB schema."

Tags: haskell redis

June 17, 2025 08:43 PM

June 16, 2025

Monday Morning Haskell

Two Pointer Algorithms
We’re now on to part 5 of our series comparing Haskell and Rust solutions for LeetCodeproblems. You can also look at the previous parts (Part 1, Part 2, Part 3, Part 4) to get some more context on what we’ve learned so far comparing these two languages.

For a full look at problem solving in Haskell, check out Solve.hs, our latest course! You’ll get full breakdowns on the processes for solving problems in Haskell, from basic list and loop problems to advanced algorithms!

The Problem

Today we’ll be looking at a problem called Trapping Rain Water. In this problem, we’re given a vector of heights, which form a sort of 1-dimensional topology. Our job is to figure out how many units of water could be collected within the topology.

As a very simple example, the input [1,0,2] could collect 1 unit of water. Here’s a visualization of that system, where x shows the topology and o shows water we collect:
x
xox
We can never collect any water over the left or right “edges” of the array, since it would flow off. The middle index of our array though is lower than its neighbors. So we take the lower of these neighboring values, and we see that we can collect 1 unit of water in this system.

For a bigger example that collects water, we might have the input [4, 2, 1, 1, 3, 5]. Here’s what that looks like:
x
x o o o o x
x o o o x x 
x x o o x x
x x x x x x
The total water here is 9.

A flat system like [2,2,2], or a system that looks like a peak [1,2,3,2,1] cannot collect any water, so we should return 0 in these cases.

The Algorithm

There are a couple ways to solve this. One approach would be a two-pass solution, similar to what we used in Product of Array Except Self. We loop from the left side, tracking the maximum water we can store in each unit based on its left neighbors. Then we loop again from the right side and compare the maximum we can store based on the right neighbors to the prior value from the left. This solution is O(n) time, but O(n) space as well.

A more optimal solution for this problem is a two-pointer approach that can use O(1) additional space. In this kind of solution, we look at the left and right of the input simultaneously. Each step of the way, we make a decision to either increase the “left pointer” or decrease the “right pointer” until they meet in the middle. Each time we move, we get more information about our solution.

In this particular problem, we’ll track the maximum value we’ve seen from the left side and the maximum value we’ve seen from the right side. As we traverse each index, we update both sides for the current left and right indices if we have a new maximum.

The crucial step is to see that if the current “left max” is smaller than the current “right max”, we know how much water can be stored at the left index. This is just the left max minus the left index. Then we can increment the left index.

If the opposite is true, we calculate how much water can be stored at the right index, and decrease the right index.

So we keep a running tally of these sums, and we end our loop when they meet in the middle.

Rust Solution

We can describe our algorithm as a simple while loop. This loop goes until the left index exceeds the right index. The loop needs to track 5 values:

Left Index

Right Index

Left Max

Right Max

Total sum so far

So let’s write the setup portion of the loop:
pub fn trap(height: Vec<i32>) -> i32 {
    let mut leftMax = -1;
    let mut rightMax = -1;
    let mut leftI = 0;
    let mut rightI = height.len() - 1;
    let mut total = 0;
    while leftI <= rightI {
        ...
    }
}
A subtle thing…the constraints on the LeetCode problem are that the length is at least 1. But to handle length 0 cases, we would need a special case. Rust uses unsigned integers for vector length, so taking height.len() - 1 on a length-0 vector would give the maximum integer, and this would mess up our loop and indexing.

Within the while loop, we run the algorithm.

Adjust leftMax and rightMax if necessary.

If leftMax is not larger, recurse, incrementing leftI and adding to total from the left

If rightMax is smaller, decrement rightI and add total from the right

And at the end, we return our total!
pub fn trap(height: Vec<i32>) -> i32 {
    let n = height.len();
    if n <= 1 {
        return 0;
    }
    let mut leftMax = -1;
    let mut rightMax = -1;
    let mut leftI = 0;
    let mut rightI = n - 1;
    let mut total = 0;
    while leftI <= rightI {
        // Step 1
        leftMax = std::cmp::max(leftMax, height[leftI]);
        rightMax = std::cmp::max(rightMax, height[rightI]);
        if leftMax <= rightMax {
            // Step 2
            total += leftMax - height[leftI];
            leftI += 1;
        } else {
            // Step 3
            total += rightMax - height[rightI];
            rightI -= 1;
        }
    }
    return total;
}
Haskell Solution

Now that we’ve seen our Rust solution with a single loop, let’s remember our process for translating this idea to Haskell. With a two-pointer loop, the way in which we traverse the elements of the input is unpredictable, thus we need a raw recursive function, rather than a fold or a map.

Since we’re tracking 5 integer values, we’ll want to write a loop function that looks like this:
-- (leftIndex, rightIndex, leftMax, rightMax, sum)
loop :: (Int, Int, Int, Int, Int) -> Int
Knowing this, we can already “start from the end” and figure out how to invoke our loop from the start of our function:
trapWater :: V.Vector Int -> Int
trapWater input = loop (0, n - 1, -1, -1, 0)
  where
    n = V.length input

    loop :: (Int, Int, Int, Int, Int) -> Int
    loop = undefined
In writing our recursive loop, we’ll start with the base case. Once leftI is the bigger index, we return the total.
trapWater :: V.Vector Int -> Int
trapWater input = loop (0, n - 1, -1, -1, 0)
  where
    n = V.length input

    loop :: (Int, Int, Int, Int, Int) -> Int
    loop (leftI, rightI, leftMax, rightMax, total) = if leftI > rightI then total
      else …
Within the else case, we just follow our algorithm, with the same 3 steps we saw with Rust.
trapWater :: V.Vector Int -> Int
trapWater input = loop (0, n - 1, -1, -1, 0)
  where
    n = V.length input

    -- (leftIndex, rightIndex, leftMax, rightMax, sum)
    loop :: (Int, Int, Int, Int, Int) -> Int
    loop (leftI, rightI, leftMax, rightMax, total) = if leftI > rightI then total
      else
        -- Step 1
        let leftMax' = max leftMax (input V.! leftI)
            rightMax' = max rightMax (input V.! rightI)
        in  if leftMax' <= rightMax'
              -- Step 2
              then loop (leftI + 1, rightI, leftMax', rightMax', total + leftMax' - input V.! leftI)
              -- Step 3
              else loop (leftI, rightI - 1, leftMax', rightMax', total + rightMax' - input V.! rightI)
And we have our Haskell solution!

Conclusion

If you’ve been following this whole series so far, hopefully you’re starting to get a feel for comparing basic algorithms in Haskell and Rust (standing as a proxy for most loop-based languages). In general, we can write loops as recursive functions in Haskell, capturing the “state” of the list as the input parameter for that function.

In particular cases where each iteration deals with exactly one element of an input list, we can employ folds as a tool to simplify our functions. But the two-pointer algorithm we explored today falls into the general recursive category.

To learn the details of understanding these problem solving techniques, take a look at our course, Solve.hs! You’ll learn everything from basic loop and list techniques, to advanced data structures and algorithms!
by James Bowen at June 16, 2025 08:30 AM

Well-Typed.Com

Making GHCi compatible with multiple home units
GHC’s support for compiling multiple units in a single invocation is essential for tooling to work well with real-world Haskell projects. Loading your whole project into a single GHCi session allows you to get feedback quickly on changes to any part of your project, without having to restart the REPL. Until now, not all of GHCi worked with multiple home units, and this was a source of confusion for many users.

We’re now happy to announce that in 9.14.1, GHCi will fully support multiple home units. This post contains a brief overview of the changes.

Multiple Home Units

Work on multiple home units has been ongoing for a while. This is the latest chapter in our efforts to update the ecosystem to support this feature.

GSOC 2020: Multiple Home Units (Hannes)

The Interface for Multiple Home Units (Matthew)

Multi Component support for cabal repl (Matthew)

Multi Component support in HLS with cabal-install 3.12 (Zubin)

The main way to start a multi-unit GHCi session is by using cabal repl --enable-multi-repl with a selector that selects multiple components in the project, such as all:
> cabal repl --enable-multi-repl all
This will start a GHCi session with a home unit for each selected component. Until now, support in the REPL was essentially limited to reloading modules to get feedback about changes. Almost all other commands were unsupported when using multiple home units.

GHCi Supports Multiple Home Units

Following our changes, GHCi now fully supports multiple home units in its REPL. The experience of a user is now the same whether they are using a single home unit or multiple home units. In particular, the following features have been fixed or enabled:

Usual REPL usage such as evaluating expressions

All GHCi commands

:seti/:set

:browse

:module [+/-] [*]Mod1 ...

… and many more!

The GHCi debugger

:break, :steplocal, :continue, etc…

Implementing Multi Unit Support in GHCi

To fully support multiple home units, GHCi needed a new internal model of how different contexts interact during a session. There are three key contexts:

the prompt (the context in which expressions are evaluated),

the script context (in which scripts loaded by :load are executed), and

the unit context (the home units specified on the command line, e.g. the components of the Cabal packages being loaded).

Distinguishing these three different contexts is the key to our design. Before, each GHCi session only had a single home unit, and so commands would always be interpreted relative to that unit. In a multi-unit session, one of the units was chosen as the “active” unit, and commands would be interpreted relative to that unit. Now since it is possible to talk precisely about the different contexts, the dependencies between them and where commands should be interpreted, we can properly implement all GHCi commands.

Virtual home units

Our design adds virtual home units for the prompt and script contexts. Therefore, every GHCi session is a multi-unit session, and all commands are modified to support this.

This virtual home unit for the prompt is called interactive-ghci. All user input is interpreted in the context of interactive-ghci (it is the “active” unit). Since it always depends on all user-given home units (i.e. those given on the command line), we can import modules, run code, and execute GHCi commands as usual.

The virtual home unit for scripts is called interactive-session. It is similar in structure to interactive-ghci, namely that it depends on all user-given home units. This allows scripts to use packages from the current GHCi REPL session. Additionally, interactive-ghci depends on interactive-session, allowing the user to load and execute the script modules from the prompt.

Why do we need two virtual home units? When a script is loaded via :load Mod.hs, this Mod.hs needs to be interpreted relative to some home unit. We do not want to guess which home unit Mod.hs should be added to, since the behaviour is hard to predict in a multiple home unit session. However, we also can’t add Mod.hs to the interactive-ghci home unit, as we want to be able to maintain a different set of GHC options for the prompt (i.e. interactive-ghci) and scripts.

Adding these two virtual home units to the GHCi REPL session yields the following Home Unit Graph. We mark interactive-ghci to indicate that it is the “active” context of the GHCi prompt.

GHCi’s Home Unit Graph, showing two virtual units interactive-ghci and interactive-session, where the former depends on the latter. Both of these depend on any number of user-given home units, indicated by the names pkg1 … pkgN.

Examples

Now that we know how the GHCi session will work, let’s show a couple of concrete examples.

We assume a regular cabal project, initialised via the command:
> mkdir mhu-example && cd mhu-example
> cabal init -n --tests --libandexe
This creates a cabal project with three components:

lib:mhu-example: The main library.

exe:mhu-example: An executable.

test:mhu-example-test: A test-suite.

From the perspective of GHC, a unit is essentially identical to a single component (with some hand-waving).

Example of a cabal project with multiple components. GHC treats each component as a separate unit.

When we load only the library into a GHCi session, then the library is the single user-specified home unit in the GHCi session. For example, the cabal invocation
cabal repl lib:mhu-example
invokes the following GHC command:
ghc --interactive -this-unit-id lib-mhu-example -package base -package containers ...
This creates a home unit graph with three home units: interactive-ghci, interactive-session and mhu-example-library.

Home Unit Graph with a single user-specified Home Unit. There are three units, interactive-ghci, interactive-session and lib:mhu-example. interactive-ghci depends on interactive-session and lib:mhu-example, while interactive-session depends on lib:mhu-example.

In the case of more than one user-specified home unit, the graph is extended in an intuitive way. For example, the cabal invocation
cabal repl --enable-multi-repl lib:mhu-example exe:mhu-example test:mhu-example-test
will result in the following GHC invocation:¹
ghc --interactive -unit @lib-mhu-example -unit @exe-mhu-example -unit @test-mhu-example-test
GHCi internally structures this as the following:

Home Unit Graph with a multiple user-specified home units. There are five units, called interactive-ghci, interactive-session, lib:mhu-example, exe:mhu-example and test:mhu-example-test.

Naturally, home units can have dependencies on other home units, e.g. test:mhu-example-test and exe:mhu-example both depend on lib:mhu-example.

Setting REPL Options

The GHCi commands :set and :seti are used to change the GHC options of the home units and the ghc options for the prompt respectively. In the new architecture, the :set command applies the new options to all home units except interactive-ghci. :seti, on the other hand, applies changes only to the interactive-ghci home unit.

In the future, we may want to extend the capabilities of the :set command to change the GHC options only for certain home units.

Summary

GHCi is now fully compatible with multiple home units, including all GHCi commands and the GHCi debugger. Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way. The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.

This work has been performed in collaboration with Mercury, who have a long-term commitment to the scalability and robustness of the Haskell ecosystem. Well-Typed are always interested in projects and looking for funding to improve GHC and other Haskell tools. Please contact [email protected] if we might be able to work with you!

The unit arguments are passed using response files. The file exe-mhu-example contains the arguments for the exe:mhu-example home unit, and similarly for the other files.↩︎
by hannes, matthew at June 16, 2025 12:00 AM

Brent Yorgey

Monads are not like burritos

Monads are not like burritos

Posted on June 16, 2025
Tagged monad, pedagogy, meme, burrito, analogy, Haskell

In January 2009, while just a baby first-year PhD student, I wrote a blog post titled Abstraction, intuition, and the “monad tutorial fallacy”. In it, I made the argument that humans tend to learn best by first grappling with concrete examples, and only later proceeding to higher-level intuition and analogies; hence, it’s a mistake to think that clearly presenting your intuition for a topic will help other people understand it. Analogies and intuition can help, but only when accompanied by concrete examples and active engagement. To illustrate the point, I made up a fictitious programmer with a fictitious analogy.

But now Joe goes and writes a monad tutorial called “Monads are Burritos,” under the well-intentioned but mistaken assumption that if other people read his magical insight, learning about monads will be a snap for them. “Monads are easy,” Joe writes. “Think of them as burritos.” Joe hides all the actual details about types and such because those are scary, and people will learn better if they can avoid all that difficult and confusing stuff. Of course, exactly the opposite is true, and all Joe has done is make it harder for people to learn about monads…

My intention was to choose a fictitious analogy which was obviously ridiculous and silly, as a parody of many of the monad tutorials which existed at the time (and still do). Mark Jason Dominus then wrote a blog post, Monads are like burritos, pointing out that actually, monads are kinda like burritos. It’s really funny, though I don’t think it’s actually a very good analogy, and my guess is that Mark would agree: it was clearly written as a silly joke and not as a real way to explain monads.

In any case, from that point the “monads are burritos” meme took on a life of its own. For example:

Chris Done made a webcomic about it

Ed Morehouse wrote a ridiculous paper exploring the categorical foundations of burritos

Someone made a burrito library in Rust

Dr Eugenia Cheng tweeted about it

I even joined in the fun and made this meme image about bad monad tutorials:

Of course there are lots of people who still understand that it was all just a silly joke. Recently, however, I’ve seen several instances where people apparently believe “monads are burritos” is a real, helpful thing and not just a joke meme. For example, see this thread on lobste.rs, or this Mastodon post.

So, to set the record straight: “monads are burritos” is not a helpful analogy!Yes, I am writing a blog post because People Are Wrong On The Internet, and I know it probably won’t make any difference, but here we are.

Why not, you ask? To expand on my reasons from a 10-year-old Reddit comment:

The burrito analogy strongly implies that a value of type m a somehow “contains” a value (or values) of type a. But that is not true for all monads (e.g. there is no sense in which a value of type IO String contains a String).

Relatedly, the analogy also implies that a value of type m a can be “unwrapped” to get an a, but this is impossible for many monads.

It is not actually very easy to take a burrito containing a burrito and merge it into a single-level burrito. At least this is not in any sense a natural operation on burritos. Perhaps you could argue that it is always easy to remove outer tortilla layers (but not the innermost one since the food will all fall out), but this is a bad analogy, since in general join does not just “remove” an outer layer, but somehow merges the effects of two layers into one.

Actually, burritos are a great analogy for the Identity monad! …but not much beyond that.

On a more positive note, my sense is that the average pedagogical quality of Haskell materials, and monad tutorials in particular, has indeed gone up significantly since 2009. I’d love to think this can be at least partially attributed to my original blog post, though of course it’s impossible to know that for sure.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 16, 2025 12:00 AM

June 15, 2025

Chris Reade

PenroseKiteDart User Guide
Introduction

(Updated June 2025 for PenroseKiteDart version 1.4)

PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).

This document summarises the design and use of the PenroseKiteDart package.

PenroseKiteDart package is now available on Hackage.

The source files are available on GitHub at https://github.com/chrisreade/PenroseKiteDart.

There is a small art gallery of examples created with PenroseKiteDart here.

Index

About Penrose’s Kites and Darts

Using the PenroseKiteDart Package (initial set up).

Overview of Types and Operations

Drawing in more detail

Forcing in more detail

Advanced Operations

Other Reading

1. About Penroseâ€™s Kites and Darts

The Tiles

In figure 1 we show a dart and a kite. All angles are multiples of $36^{\circ}$ (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length $\phi$ , where $\phi = (1+ \sqrt{5})/ 2$ is the golden ratio.

Figure 1: The Dart and Kite Tiles

Aperiodic Infinite Tilings

What is interesting about these tiles is:

It is possible to tile the entire plane with kites and darts in an aperiodic way.

Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.

The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)

This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.

Legal Tilings

In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.

If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.

Figure 2: Marked Dart and Kite

Correct Tilings

Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.

2. Using the PenroseKiteDart Package

You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.
    module Main (main) where
    
    import Diagrams.Backend.SVG.CmdLine
    import Diagrams.Prelude
For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.
    import PKD
    import TgraphExamples
Then to ouput someExample figure
    fig::Diagram B
    fig = someExample

    main :: IO ()
    main = mainWith fig
Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).

3. Overview of Types and Operations

Half-Tiles

In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).

Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)

The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.
    data HalfTile rep 
      = LD rep -- Left Dart
      | RD rep -- Right Dart
      | LK rep -- Left Kite
      | RK rep -- Right Kite
      deriving (Show,Eq)
Tgraphs

We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.
    type TileFace  = HalfTile (Vertex,Vertex,Vertex)
    type Vertex    = Int  -- must be positive
The function
    makeTgraph :: [TileFace] -> Tgraph
then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have
    faces :: Tgraph -> [TileFace]
to retrieve the TileFace list from a Tgraph.

As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):
    fool :: Tgraph
    fool = makeTgraph [RD (1,2,3), LD (1,3,4)   -- right and left dart
                      ,LK (5,3,2), RK (5,2,7)   -- left and right kite
                      ,RK (5,4,3), LK (5,6,4)   -- right and left kite
                      ]
To produce a diagram, we simply draw the Tgraph
    foolFigure :: Diagram B
    foolFigure = draw fool
which will produce the diagram on the left in figure 4.

Alternatively,
    foolFigure :: Diagram B
    foolFigure = labelled drawj fool
will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).

Figure 4: Diagram of fool without labels and join edges (left), and with (right)

When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length $\phi$ (for kites).

Tgraph Properties

Tgraphs are actually implemented as
    newtype Tgraph = Tgraph [TileFace]
                     deriving (Show)
but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.

Note: If the TilFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.

For convenience, in addition to makeTgraph, we also have
    makeUncheckedTgraph :: [TileFace] -> Tgraph
    checkedTgraph   :: [TileFace] -> Tgraph
The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.

Main Tiling Operations

There are three key operations on finite tilings, namely
    decompose :: Tgraph -> Tgraph
    force     :: Tgraph -> Tgraph
    compose   :: Tgraph -> Tgraph
Decompose

Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor $1/{\phi} = \phi - 1$ times the scale of the former, to reflect the change in scale.

Figure 5: fool (left) and decompose fool (right)

We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose
Force

Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]

An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.

Figure 6: A Tgraph (left) and its forced result (right) with the original shown red

Compose

Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.

Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green

Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that

If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.

Try Results

It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.
    tryCompose :: Tgraph -> Try Tgraph
    tryForce   :: Tgraph -> Try Tgraph
We define Try as a synonym for Either ShowS (which is a monad) in module Tgraph.Try.
type Try a = Either ShowS a
(Note ShowS is String -> String). Successful results have the form Right r (for some correct result r) and failure results have the form Left (s<>) (where s is a String describing the problem as a failure report).

The function
    runTry:: Try a -> a
    runTry = either error id
will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.
    force = runTry . tryForce
    compose = runTry . tryCompose
Elementary Tgraph and TileFace Operations

The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.

When we need to refer to particular vertices of a TileFace we use
    originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
    oppV    :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
    wingV   :: TileFace -> Vertex -- the vertex not on the join edge
A directed edge is represented as a pair of vertices.
    type Dedge = (Vertex,Vertex)
So (a,b) is regarded as a directed edge from a to b.

When we need to refer to particular edges of a TileFace we use
    joinE  :: TileFace -> Dedge  -- shown dotted in figure 2
    shortE :: TileFace -> Dedge  -- the non-join short edge
    longE  :: TileFace -> Dedge  -- the non-join long edge
which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:
    joinOfTile:: TileFace -> Dedge
    joinOfTile face = (originV face, oppV face)
In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.

For example,
    internalEdges :: Tgraph -> [Dedge]
produces an edge list, whereas
    boundary :: Tgraph -> [Dedge]
produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function
    dedges :: Tgraph -> [Dedge]
produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.

Note: There is now a class HasFaces (introduced in version 1.4) which includes instances for both Tgraph and [TileFace] and others. This allows some generalisations. In particular the more general types of the above three functions are now
    internalEdges :: HasFaces a => a -> [Dedge]
    boundary      :: HasFaces a => a -> [Dedge] 
    dedges        :: HasFaces a => a -> [Dedge]   
Patches (Scaled and Positioned Tilings)

Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.
    type Piece = HalfTile (V2 Double)
    type Patch = [Located Piece]
Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have
    drawPiece :: Piece -> Diagram B
    dashjPiece :: Piece -> Diagram B
    fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B
where the first draws the non-join edges of a Piece, the second does the same but adds a dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.

Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.

Vertex Patches

It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).
    type VertexLocMap = IntMap.IntMap (Point V2 Double)
    data VPatch = VPatch {vLocs :: VertexLocMap,  vpFaces::[TileFace]} deriving Show
and
    makeVP :: Tgraph -> VPatch
calculates vertex locations using a default orientation and scale.

VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.

One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.
    labelled draw (rotate someAngle (makeVP g))
We can also align a VPatch using vertex labels.
    alignXaxis :: (Vertex, Vertex) -> VPatch -> VPatch 
So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.
    labelled draw (alignXaxis (a,b) (makeVP g))
Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).

4. Drawing in More Detail

Class Drawable

There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume
    draw   :: Drawable a => a -> Diagram B  -- draws non-join edges
    drawj  :: Drawable a => a -> Diagram B  -- as with draw but also draws dashed join edges
    fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours
where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.

These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.

(Update Sept 2024) As of version 1.1 of PenroseKiteDart, these will be
    draw ::   (Drawable a, OKBackend b) =>
              a -> Diagram b
    drawj ::  (Drawable a, OKBackend) b) =>
              a -> Diagram b
    fillDK :: (Drawable a, OKBackend b) =>
              Colour Double -> Colour Double -> a -> Diagram b
where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.

In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.

The drawing tools are each defined via the class function drawWith using Piece drawing functions.
    class Drawable a where
        drawWith :: (Piece -> Diagram B) -> a -> Diagram B
    
    draw = drawWith drawPiece
    drawj = drawWith dashjPiece
    fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)
To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)
    newPieceDraw :: Piece -> Diagram B
This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:
    newDraw :: Drawable a => a -> Diagram B
    newDraw = drawWith newPieceDraw
Class DrawableLabelled

Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).
    class DrawableLabelled a where
        labelColourSize :: Colour Double -> Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define
    labelSize :: DrawableLabelled a => Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
    labelSize = labelColourSize red

    labelled :: DrawableLabelled a => (Patch -> Diagram B) -> a -> Diagram B
    labelled = labelSize small
and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.

Further drawing functions

There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph
    smart :: (VPatch -> Diagram B) -> Tgraph -> Diagram B
So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g
    smart draw g
    smart (labelled draw) g
    smart (labelSize normal draw) g
When using labels, the function rotateBefore allows a Tgraph to be drawn rotated without rotating the labels.
    rotateBefore :: (VPatch -> a) -> Angle Double -> Tgraph -> a
    rotateBefore vpdraw angle = vpdraw . rotate angle . makeVP
So for example,
    rotateBefore (labelled draw) (90@@deg) g
makes sense for a Tgraph g. Of course if there are no labels we can simply use
    rotate (90@@deg) (draw g)
Similarly alignBefore allows a Tgraph to be aligned on the X-axis using a pair of vertex numbers before drawing.
    alignBefore :: (VPatch -> a) -> (Vertex,Vertex) -> Tgraph -> a
    alignBefore vpdraw (a,b) = vpdraw . alignXaxis (a,b) . makeVP
So, for example, if Tgraph g has vertices a and b, both
    alignBefore draw (a,b) g
    alignBefore (labelled draw) (a,b) g
make sense. Note that the following examples are wrong. Even though they type check, they re-orient g without repositioning the boundary joins.
    smart (labelled draw . rotate angle) g      -- WRONG
    smart (labelled draw . alignXaxis (a,b)) g  -- WRONG
Instead use
    smartRotateBefore (labelled draw) angle g
    smartAlignBefore (labelled draw) (a,b) g
where
    smartRotateBefore :: (VPatch -> Diagram B) -> Angle Double -> Tgraph -> Diagram B
    smartAlignBefore  :: (VPatch -> Diagram B) -> (Vertex,Vertex) -> Tgraph -> Diagram B
are defined using
    restrictSmart :: Tgraph -> (VPatch -> Diagram B) -> VPatch -> Diagram B
Here, restrictSmart g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.

Overlaid examples (location map sharing)

The function
    drawForce :: Tgraph -> Diagram B
will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly
    drawPCompose  :: Tgraph -> Diagram B
applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.

Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.

The location map sharing is done with
    subVP :: VPatch -> [TileFace] -> VPatch
so that subVP vp fcs is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]

For cases where labels are only going to be drawn for certain faces, we need a version of subVP which also gets rid of vertex locations that are not relevant to the faces. For this situation we have
    restrictVP:: VPatch -> [TileFace] -> VPatch
which filters out un-needed vertex locations from the vertex location map. Unlike subVP, restrictVP checks for missing vertex locations, so restrictVP vp fcs raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.

5. Forcing in More Detail

The force rules

The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.

Figure 8: Seven vertex types

Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.

Figure 9: Rules for forcing

Main Forcing Operations

To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.

This allows us to define
    force :: Forcible a => a -> a
    tryForce :: Forcible a => a -> Try a
The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.

There are several other operations related to forcing including
    stepForce :: Forcible a => Int -> a -> a
    tryStepForce  :: Forcible a => Int -> a -> Try a

    addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
    tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a
The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.

Update Generators

An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example
    forceWith :: Forcible a => UpdateGenerator -> a -> a
    tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a
In fact we defined
    force = forceWith defaultAllUGen
    tryForce = tryForceWith defaultAllUGen
We can also define
    wholeTiles :: Forcible a => a -> a
    wholeTiles = forceWith wholeTileUpdates
where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.

In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.

In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).
    type UpdateGenerator = BoundaryState -> [Dedge] -> Try UpdateMap
    type UpdateMap = Map.Map Dedge Update
    data Update = SafeUpdate TileFace 
                | UnsafeUpdate (Vertex -> TileFace)
Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.

Forcible Class Operations

The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.
    class Forcible a where
      tryFSOpWith :: UpdateGenerator -> (ForceState -> Try ForceState) -> a -> Try a
      tryChangeBoundaryWith :: UpdateGenerator -> (BoundaryState -> Try BoundaryChange) -> a -> Try a
      tryInitFSWith :: UpdateGenerator -> a -> Try ForceState
For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.

We also specialize tryFSOpWith to use the default update generator
    tryFSOp :: Forcible a => (ForceState -> Try ForceState) -> a -> Try a
    tryFSOp = tryFSOpWith defaultAllUGen
Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.

We also specialize tryChangeBoundaryWith to use the default update generator
    tryChangeBoundary :: Forcible a => (BoundaryState -> Try BoundaryChange) -> a -> Try a
    tryChangeBoundary = tryChangeBoundaryWith defaultAllUGen
Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.

The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case
    tryInitFS :: Forcible a => a -> Try ForceState
    tryInitFS = tryInitFSWith defaultAllUGen
Efficient chains of forcing operations.

Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.

For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = force . addHalfDart d . force . decompose
Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.

This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so
    force . addHalfDart d . force
becomes
    tryForce <=< tryAddHalfDart d <=< tryForce
Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose
This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . tryFSOp (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose
The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.

A limitation of forcing

To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use
    recalibratingForce :: Forcible a => a -> a
    tryRecalibratingForce :: Forcible a => a -> Try a
These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either
    recalibratingForce (decompositions kingGraph !!6)
or using an extra force before the decompositions
    force (decompositions (force kingGraph) !!6)
In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.

6. Advanced Operations

Guided comparison of Tgraphs

Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have
    sameGraph :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Bool
The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.

It is implemented using
    tryRelabelToMatch :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph
where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)

The above function is also used to implement
    tryFullUnion:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph
which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.

Finally we have
    commonFaces:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> [TileFace]
which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.

Empires and SuperForce

In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation
    superForce:: Forcible a => a -> a
and operations to calculate empires.

We will not repeat the descriptions here other than to note that
    forcedBoundaryECovering:: Tgraph -> [Tgraph]
finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that

each extension has the whole boundary of force g as internal edges.

each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.

(possible here means – not leading to a stuck Tgraph when forced.) There is also
    forcedBoundaryVCovering:: Tgraph -> [Tgraph]
which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.

Combinations and Explicitly Forced

We introduced a new type Forced (in v 1.3) to enable a forcible to be explictily labelled as being forced. For example
    forceF    :: Forcible a => a -> Forced a 
    tryForceF :: Forcible a => a -> Try (Forced a)
    forgetF   :: Forced a -> a
This allows us to restrict certain functions which expect a forced argument by making this explicit.
    composeF :: Forced Tgraph -> Forced Tgraph
The definition makes use of theorems established in Graphs,Kites and Darts and Theorems that composing a forced Tgraph does not require a check (for connectedness and no crossing boundaries) and the result is also forced. This can then be used to define efficient combinations such as
    compForce:: Tgraph -> Forced Tgraph      -- compose after forcing
    composeForce = composeF . forceF

    allCompForce:: Tgraph -> [Forced Tgraph] -- iterated (compose after force) while not emptyTgraph
    maxCompForce:: Tgraph -> Forced Tgraph   -- last item in allCompForce (or emptyTgraph)
Tracked Tgraphs

The type
    data TrackedTgraph = TrackedTgraph
       { tgraph  :: Tgraph
       , tracked :: [[TileFace]] 
       } deriving Show
has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of
    decomposeTracked :: TrackedTgraph -> TrackedTgraph
Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.

The function
    drawTrackedTgraph :: [VPatch -> Diagram B] -> TrackedTgraph -> Diagram B
is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.

Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have
    empire1, empire2 :: Tgraph -> TrackedTgraph
    
    drawEmpire :: TrackedTgraph -> Diagram B
Figure 10 was also created using TrackedTgraphs.

Figure 10: Using a TrackedTgraph for drawing

7. Other Reading

Previous related blogs are:

Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).

Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.

Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.
by readerunner at June 15, 2025 03:32 PM

June 11, 2025

Simon Marlow

Browsing Stackage with VS Code and Glean

Browsing Stackage with VS Code and Glean
June 11, 2025
Have you ever wished you could browse all the Haskell packages together in your IDE, with full navigation using go-to-definition and find-references? Here’s a demo of something I hacked together while at ZuriHac 2025 over the weekend:

In the previous post I talked about how to index all of Hackage (actually Stackage, strictly speaking, because it’s not in general possible to build all of Hackage together) using Glean. Since that post I made some more progress on the indexer:

The indexer now indexes types. You can see type-on-hover working in the demo. The types are similar to what you see in the Haddock-generated hyperlinked source, except that here it’s always using the type of the definition and not the type at the usage site, which might be more specific. That’s a TODO for later.

Fixed a bunch of things, enriched the index with details about constructors, fields and class methods, and made indexing more efficient.

The DB size including types is now about 850MB, and it takes just under 8 minutes on my 9-year-old laptop to index the nearly 3000 packages in my stackage LTS 21.21 snapshot. (Note: the figures here were updated on 12-06-2025 when I redid the measurments).

Hooking it up to VS Code

The architecture looks like this:

The LSP server is a modified version of static-ls, which is already designed to provide an LSP service based on static information. I just reimplemented a few of its handlers to make calls to Glass instead of the existing hie/hiedb implementations. You can see the changes on my fork of static-ls. Of course, these changes are still quite hacky and not suitable for upstreaming.

Glass is a “Language-agnostic Symbol Server”. Essentially it provides an API abstraction over Glean with operations that are useful for code navigation and search.

Where to next?

There remain a few issues to solve before this can be useful.

Make Glean more easily installable. There’s a general concensus that cabal install glean would lower the barrier to entry significantly; in order to do this we need to build the folly dependency using Cabal.

Clean up and ship the LSP server, somehow. Once Glean is cabal-installable, we can depend on it from an LSP server package.

Think about continuous integration to build the Glean DB. Perhaps this can piggyback off the stackage CI infra? If we can already build a complete stackage snapshot, and Glean is easily installable, then indexing would be fairly straightforward. I’d love to hear suggestions on how best to do this.

And looking forwards a bit further:

Think about how to handle multiple packages versions. There’s no fundamental problem with indexing multiple package versions, except that Glass’s SymbolID format currently doesn’t include the package version but that’s easily fixable. We could for example build multiple stackage LTS instances and index them all in a single Glean DB. There would be advantages to doing this, if for instance there were packages in common between two Stackage instances then the Glean DB would only contain a single copy. A lot of the type structure would be shared too.

Provide search functionality in the LSP. Glean can provide simple textual search for names, and with some work could also provide Hoogle-like type search.

Think about how to index local projects and local changes. Glean supports stacked and incremental DBs, so we could build a DB for a local project stacked on top of the full Stackage DB. You would be able to go-to-definition directly from a file in your project to the packages it depends on in Stackage. We could re-index new .hie files as they are generated, rather like how static-ls currently handles changes.

Integrate with HLS? Perhaps Glean could be used to handle references outside of the current project, switching seamlessly from GHC-based navigation to Glean-based navigation if you jump into a non-local package.

More use cases?

I talked with a few people at ZuriHac about potential use cases for Glean within the Haskell ecosystem. Using it in haskell.org came up a few times, as a way to power search, navigation and analysis. Also mentioned was the possibility of using it as a Hoogle backend. Potentially we could replace the Haddock-generated hyperlinked sources on haskell.org with a Glean-based browser, which would allow navigating links between packages and find-references.

Another use cases that came up was the possibility of doing impact analysis for core library changes (or any API changes really). Some of this is already possible using find-references, but more complex cases such as finding instances that override certain methods aren’t possible yet until we extend the indexer to capture richer information.

If you’re interested in using Glean for something, why not jump on the Glean discord server and tell us about it!

June 11, 2025 12:00 AM

June 10, 2025

Brent Yorgey

Introduction to competitive programming in Haskell
Introduction to competitive programming in Haskell

Posted on June 10, 2025
Tagged Kattis, competitive programming, haskell
A few days ago I gave a talk at ZuriHac 2025 entitled Haskell for Competitive Programming, a basic introduction to competitive programming in general, and the joy of using Haskell for competitive programming in particular. This is an expanded version of my talk in blog post form. (For an even gentler introduction to competitive programming in Haskell, see this old blog post from 2019.)

Competitive Programming

First of all, what is competitive programming? It’s a broad term, but when I talk about competitive programming I have something in mind along the following lines:

There are well-specified input and output formats, usually with a few examples, and a precise specification of what the output should be for a given input.

Your job is to write a program which transforms input meeting the specification into a correct output.

You submit your program, which is tested on a number of inputs and declared correct if and only if it yields the correct output for all the tested inputs.

There is often time pressure involved—that is, you have a limited amount of time in which to write your program. However, it is also possible to participate “recreationally”, simply for the joy of problem-solving, without time pressure (in fact, the vast majority of the competitive programming I do is of this form, though I have occasionally participated in timed contests).

There are many variations: whether you are allowed to use code libraries prepared ahead of time, or must type everything from scratch; outputs can be scored according to some criteria rather than simply being judged right or wrong; and so on.

There are many sites which allow you to participate in contests and/or solve competitive programming problems recreationally. My favorite is Open Kattis; I mention some others at the end of this post.
Pot: a first example

As an introductory example, let’s look at Pot. As usual, there’s a silly story, but what it boils down to is that we will be given a sequence of numbers, and we should interpret the last digit of each number as an exponent, then sum the results. For example, if given 125, we should interpret it as $12^5$, and so on.
Dealing with I/O via interact

An imperative approach to such a problem would involve doing a sequence of input commands, some computation, and a sequence of output commands—possibly interleaved with one another—and we might immediately think to start using functions like getLine and putStrLn to do the required I/O in Haskell. However, there is a much more fruitful functional perspective: we are simply being asked to implement a particular (partial) function of type String -> String. The fact that the function’s input and output should be hooked up to the program’s standard input and output is just an implementation detail. Competitive programming is functional at heart!

It turns out that Haskell’s standard library already has the perfect built-in function for this scenario:
interact :: (String -> String) -> IO ()
interact takes a pure String -> String function and turns it into an IO action which reads from standard input, passes the input to the given String -> String function, and prints the result to standard output. It even does this using lazy I/O—that is, the input is read lazily, as demanded by the function, so that the output and input can be automatically interleaved depending on which parts of the output depend on which parts of the input. In particular, this means that that the entire input need not be stored in memory at once. If the inputs can be processed into outputs in a streaming fashion—as is the case in the example problem we are currently considering—then the input and output will be interleaved. In general, this kind of lazy I/O is problematic and even unsafe, but it’s perfect for this scenario.
Solving the problem with a pipeline

So interact does all the IO for us, and all we have to do is write a pure String -> String function which transforms the input to the output. In this case, we can split the input into lines, drop the first line (we don’t need to know how many lines of input there are—we just get a list of all of them, since interact will read until EOF), read each number and turn it into the first digits raised to the power of the last digit, then sum them and show the result. The full solution is below. Notice how I use the “backwards composition” operator (>>>), since I find it more convenient to type from left to right as I’m thinking about transforming from input to output.
import Control.Category ((>>>))

main = interact $
  lines >>> drop 1 >>> map (read >>> process) >>> sum >>> show

process :: Integer -> Integer
process n = (n `div` 10) ^ (n `mod` 10)
I use Integer here since raw performance doesn’t matter much for this easy problem, and Integer avoids any potential problems with overflow. However, using Int instead of Integer can make a big difference for some compute-intensive problems. On Kattis, Int will always be 64 bits, but last time I checked Int can be 32 bits on Codeforces.
Shopping List: wholemeal programming and ByteString

Let’s consider Shopping List as a second example. In this problem, we are given a list of shopping lists, where each shopping list consists of a list of space-separated items on a single line. We are asked to find the items which are common to all the shopping lists, and print them in alphabetical order.
Wholemeal programming with standard data structures

This problem is very amenable to a “wholemeal programming” approach, where we work entirely at the level of whole data structure transformations rather than looping over individual elements. We can turn each shopping list into a set, then find the intersection of all the sets. Moreover, if we use Data.Set, which uses an ordering on the elements, we will get the result in alphabetical order “for free” (“free” as in the amount of code we have to write, not necessarily runtime cost). Haskell has a decent collection of data structures in the containers library ((Int)Set, (Int)Map, Seq, Tree, and even Graph) with a large collection of standard methods to construct and manipulate them, which are bread and butter for many competitive programming problems.
{-# LANGUAGE ImportQualifiedPost #-}

import Control.Category ((>>>))
import Data.Set (Set)
import Data.Set qualified as S

main = interact $
  lines >>> drop 1 >>> map (words >>> S.fromList) >>>
  foldr1 S.intersection >>>
  (\s -> show (S.size s) : S.toList s) >>> unlines
ByteString vs String

Unfortunately, when we try submitting this code, we get a Time Limit Exceeded error! What’s wrong?

The issue is our use of String, which is an actual linked list of characters and is very slow, especially when we have many short strings, as in this problem. In the worst case, we could have 100 shopping lists, each with 5000 items of length 10, for a total of up to 5 MB of input; with that much input data to read, any overhead associated with reading and parsing the input can make a significant difference.

Switching to ByteString is much faster. Why not Text, you ask? Well, Text has to do a bunch of extra work to deal properly with Unicode encodings, but in 99.99% of all competitive programming problems I’ve ever seen, the input is guaranteed to be ASCII. So not only do we not need Text, we can get away with a version of ByteString that simply assumes every character is a single 8-bit byte!

Once we import it, all we need to do is replace a bunch of String operations with corresponding ByteString ones.
{-# LANGUAGE ImportQualifiedPost #-}

import Control.Category ((>>>))
import Data.Set (Set)
import Data.Set qualified as S
import Data.ByteString.Lazy.Char8 qualified as BS

main = BS.interact $
  BS.lines >>> drop 1 >>> map (BS.words >>> S.fromList) >>>
  foldr1 S.intersection >>>
  (\s -> BS.pack (show (S.size s)) : S.toList s) >>> BS.unlines
A Favourable Ending: input parsing and lazy recursive structures

As a last example, let’s look at A Favourable Ending. This problem consists of a number of test cases; each test case describes a choose-your-own-adventure book with a number of sections, where each section is either an ending (either good or bad), or allows the reader to choose among three sections to proceed to next. For each test case, we are asked how many distinct stories there are with good endings.

More abstractly, since we are guaranteed that there are no loops, the sections of the book form a DAG, and we are asked to count the number of distinct paths in a DAG from a distinguished start node to any of a distinguished set of “good” leaves.
Parsing with Scanner

Parsing the input for this problem is trickier than the other examples so far. In theory, we could still ignore the first number specifying the number of test cases, and just continue reading test cases until EOF. However, each test case begins with a number specifying the number of sections in the book, and we cannot ignore this number: we need to know how many lines to read before the start of the next test case. Doing this manually involves pattern-matching on a list of lines, using splitAt to split off the lines for each test case, and manually passing around the list of the remaining lines: tedious.

Fortunately, Haskell is great at building abstractions to insulate us from such tedium. I’ve developed a simple Scanner abstraction which works well in this context.

We begin by creating some data types to represent the input in structured form:
type Book = Map Int Section

data Section = End Disposition | Choice [Int]
  deriving (Eq, Show)

data Disposition = Favourably | Catastrophically
  deriving (Eq, Show, Read)
Now we can write a Scanner to read a Book:
book :: Scanner Book
book = do
  s <- int
  M.fromList <$> s >< ((,) <$> int <*> section)

section :: Scanner Section
section = do
  t <- peek
  if isDigit (BS.head t)
    then Choice <$> (3 >< int)
    else End . readLower . BS.unpack <$> str

readLower :: Read a => String -> a
readLower = read . onHead toUpper

onHead :: (a -> a) -> [a] -> [a]
onHead _ [] = []
onHead f (x : xs) = f x : xs
(readLower and onHead are functions in my personal competitive programming template, included here for completeness).

One more piece of boilerplate we can write at this point is the main function, which simply consists of running the Scanner to read all the test cases, solving each test case, and formatting the output.
main = BS.interact $ runScanner (numberOf book) >>> map (solve >>> showB) >>> BS.unlines
DP + topsort with a lazy recursive map

With all that framework out of the way, we can turn to actually solving the problem. And here is where something really fun happens. In a typical imperative language, we would have to first topologically sort the book sections, then use dynamic programming to compute the number of good stories beginning at each section, starting with the leaves and proceeding backwards through the topological sort to the start—dozens of lines of code. However, in Haskell we can get all of this for free, just by defining a lazy, recursive map!
solve :: Book -> Int
solve book = endings ! 1
  where
    endings = M.fromList [(p, endingsFrom (book!p)) | p <- M.keys book]
    endingsFrom (End d) = if d == Favourably then 1 else 0
    endingsFrom (Choice ps) = sum $ map (endings !) ps
endings is a Map from each book section to the number of favorable stories starting with that section. Notice how its values are defined via the endingsFrom function, which is in turn defined, in the Choice case, by looking up the values of the choices in the endings map and summing them. endings is thus defined recursively, which works because it is lazy in the values. When we demand the value of endings ! 1, the runtime system starts evaluating thunks in the map as needed, implicitly doing a topological sort for us.

Here’s another way to think about this: what we really want is the function endingsFrom : Section -> Int, which tells us how many good endings there are starting at a given section. It can be defined via a recurrence; however, if we were to literally implement it as a recursive function, our program would spend a ridiculous amount of time recomputing the same values over and over again. So, we insert a lazy map in the middle to memoize it (there are other data structures that can be used for this purpose as well).
Resources

Here are some resources in case you’re interested in exploring more.

Open Kattis has a collection of thousands of high-quality problems which can be solved in Haskell (or many other languages). If you just want to try solving some problems for fun, it’s a great place to start.

There are also other sites which accept Haskell, such as Codeforces. Check these out if you want to actually participate in timed contests.

My public listing of Kattis problems I have solved, with my own personal rating system.

I’ve written a series of blog posts about competitive programming in Haskell, on a variety of topics.

I also have a repository of modules I’ve developed specifically for competitive programming. Many of the modules are documented in one or more blog posts.

Soumik Sarkar has an even larger collection of Haskell libraries for competitive programming.
<noscript>Javascript needs to be activated to view comments.</noscript>
by Brent Yorgey at June 10, 2025 12:00 AM

June 09, 2025

Monday Morning Haskell

Spatial Reasoning with Zigzag Patterns!
Today we’re continuing our study of Rust and Haskell solutions to basic coding problems. This algorithm is going to be a little harder than the last few we’ve done in this series, and it will get trickier from here!

For a complete study of problem solving techniques in Haskell, make sure to check out Solve.hs. This course runs the gamut from basic solving techniques to advanced data structures and algorithms, so you’ll learn a lot!

The Problem

Today’s problem is Zigzag Conversion. This is an odd problem that stretches your ability to think iteratively and spatially. The idea is that you’re given an input string and a number of “rows”. You need to then imagine the input word written as a zig-zag pattern, where you write the letters in order first going down, and then diagonally up to the right until you get back to the first row. Then it goes down again. Your output must be characters re-ordered in “row-order” after this zig-zag rearrangement.

This makes the most sense looking at examples. Let’s go through several variations with the string MONDAYMORNINGHASKELL. Here’s what it looks like with 3 rows.
M   A   R   G   K
O D Y O N N H S E L
N   M   I   A   L
So to get the answer, we read along the top line first (MARGK), then the second (ODYONNHSEL), and then the third (NMIAL). So the final answer is MARGKODYONNHSELNMIAL.

Now let’s look at the same string in 4 rows:
M     M     G     L
O   Y O   N H   E L
N A   R I   A K
D     N     S
The answer here is MMGLOYONHELNARIAKDNS.

Here’s 5 rows:
M       R       K
O     O N     S E
N   M   I   A   L
D Y     N H     L
A       G
The answer here is MRKOONSENMIALDYNHLAG.

And now that we have the pattern, we can also consider 2 rows, which doesn’t visually look like a zig-zag as much:
M N A M R I G A K L
O D Y O N N H S E L
This gives the answer MNAMRIGAKLODYONNHSEL.

Finally, if there’s only 1 row, you can simply return the original string.

The Algorithm

So how do we go about solving this? The algorithm here is a bit more involved than the last few weeks!

Our output order is row-by-row, so for our solution we should think in a row-by-row fashion. If we can devise a function that will determine the indices of the original string that belong in each row, then we can simply loop over the rows and append these results!

In order to create this function, we have to think about the zig-zag in terms of “cycles”. Each cycle begins at the top row, goes down to the bottom row, and then up diagonally to the second row. The next element to go at the top row starts a new cycle. By thinking about cycles, we’ll discover a few key facts:

With n rows (n >= 2), a complete cycle has 2n - 2 letters.

The top and bottom row get one letter per cycle.

All other rows get two letters per cycle.

Now we can start to think mathematically about the indices that belong in each row. It’s easiest to think about the top and bottom rows, since they only get one letter each cycle. Each of these has a starting index (0 and n - 1, respectively), and then we add the cycle length 2n - 2 to these starting indices until it exceeds the length.

The middle rows have this same pattern, only now they have 2 starting indices. They have the starting index from the “down” direction and then their first index going up and to the right. The first index for row i is obviously i - 1, but the second index is harder to see.

The easiest way to find the second index is backwards! The next cycle starts at 2n - 2. So row index 1 has its second index at 2n - 2 - 1, and row index 2 has its second index at 2n - 2 - 2, and so on! The pattern of adding the “cycle number” will work for all starting indices.

Once we have the indices for each row, our task is simple. We build a string for each row and combine them together in order.

So suppose we have our 4-row example.
M     M     G     L
O   Y O   N H   E L
N A   R I   A K
D     N     S
The “cycle num” is 6 (2 * 4 - 2). So the first row has indices [0, 6, 12, 18]. The fourth row starts with index 3, and so its indices also go up by 6 each time: [3, 9, 15].

The second row (index 1) has starting indices 1 and 5 (6 - 1). So its indices are [1, 5, 7, 11, 13, 17, 19]. Then the third row has indices [2, 4, 8, 10, 14, 16].

A vector input will allow us to efficiently use and combine these indices.

As a final note, the “cycle num” logic doesn’t end up working with only 1 row. The cycle length using our calculation would be 0, not 1 as it should. The discrepancy is because our “cycle num” logic really depends on having a “first” and “last” row. So if we only have 1 row, we’ll hardcode that case and return the input string.

Rust Solution

In our rust solution, we’ll accumulate our result string in place. To accomplish this we’ll do a few setup steps:

Handle our base case (1 row)

Get the string length and cycle number

Make a vector of the input chars for easy indexing (Rust doesn’t allow string indexing)

Initialize our mutable result string
pub fn convert(s: String, num_rows: i32) -> String {
    if (num_rows == 1) {
        return s;
    }
    let n = s.len();
    let nr = num_rows as usize; // Convenience for comparison
    let cycleLen: usize = (2 * nr - 2);
    let sChars: Vec<char> = s.chars().collect();
    let mut result = String::new();
    ...
}
Now we have to add the rows in order. Since the logic differs for the first and last rows, we have 3 sections: first row, middle rows, and last row. The first and last row are straightforward using our algorithm. Each is a simple while loop.
pub fn convert(s: String, num_rows: i32) -> String {
   if (num_rows == 1) {
       return s;
   }
   let n = s.len();
   let nr = num_rows as usize; // Convenience for comparison
   let cycleLen: usize = (2 * nr - 2);
   let sChars: Vec<char> = s.chars().collect();
   let mut result = String::new();
   
   // First Row
   let mut i = 0;
   while i < n {
       result.push(sChars[i]);
       i += cycleLen;
   }

   // Middle Rows
   ...

   // Last Row
   i = (nr - 1);
   while i < n {
       result.push(sChars[i]);
       i += cycleLen;
   }
   return result;
}
Now the middle rows section is similar. We loop through each of the possible rows in the middle. For each of these, we’ll do a while loop similar to the first and last row. These loops are different though, because we have to track two possible values, the “first” and “second” of each cycle.

If the “first” is already past the end of the vector, then we’re already done and can skip the loop. But even if not, we still need an “if check” on the “second” value as well. Each time through the loop, we increase both values by cycleLen.
pub fn convert(s: String, num_rows: i32) -> String {
   if (num_rows == 1) {
       return s;
   }
   let n = s.len();
   let nr = num_rows as usize; // Convenience for comparison
   let cycleLen: usize = (2 * nr - 2);
   let sChars: Vec<char> = s.chars().collect();
   let mut result = String::new();
   
   // First Row
   let mut i = 0;
   while i < n {
       result.push(sChars[i]);
       i += cycleLen;
   }

   // Middle Rows
   for row in 1..(nr - 1) {
       let mut first = row;
       let mut second = cycleLen - row;
       while first < n {
           result.push(sChars[first]);
           if second < n {
               result.push(sChars[second]);
           }
           first += cycleLen;
           second += cycleLen;
       }
   }

   // Last Row
   i = (nr - 1);
   while i < n {
       result.push(sChars[i]);
       i += cycleLen;
   }
   return result;
}
And that’s our complete solution!

Haskell Solution

The Haskell solution follows the same algorithm, but we’ll make a few stylistic changes compared to Rust. In Haskell, we’ll go ahead and define specific lists of indices for each row. That way, we can combine these lists and make our final string all at once using concatMap. This approach will let us demonstrate the power of ranges in Haskell.

We start our defining our base case and core parameters:
zigzagConversion :: String -> Int -> String
zigzagConversion input numRows = if numRows == 1 then input
  else ...
  where
    n = length input
    cycleLen = 2 * numRows - 2

    ...
Now we can define index-lists for the first and last rows. These are just ranges! We have the starting element, and we know to increment it by cycleLen. The range should go no higher than n - 1. Funny enough, the range can figure out that it should be empty in the edge case that our input is too small to fill all the rows!
zigzagConversion :: String -> Int -> String
zigzagConversion input numRows = if numRows == 1 then input
  else ...
  where
    n = length input
    cycleLen = 2 * numRows - 2

    firstRow :: [Int]
    firstRow = [0,cycleLen..n - 1]

    lastRow :: [Int]
    lastRow = [numRows - 1, numRows - 1 + cycleLen..n - 1]

    ...
In Rust, we used a while-loop with two state values to calculate the middle rows. Hopefully you know from this series now that this while loop translates into a recursive function in Haskell. We’ll accumulate our list of indices as a tail argument, and keep the two stateful values as our other input parameters. We’ll combine all our lists together into one big list of int-lists, allRows.
zigzagConversion :: String -> Int -> String
zigzagConversion input numRows = if numRows == 1 then input
  else ...
  where
    n = length input
    cycleLen = 2 * numRows - 2

    firstRow :: [Int]
    firstRow = [0,cycleLen..n - 1]

    lastRow :: [Int]
    lastRow = [numRows - 1, numRows - 1 + cycleLen..n - 1]

    middleRow :: Int -> Int -> [Int] -> [Int]
    middleRow first second acc = if first >= n then reverse acc
      else if second >= n then reverse (first : acc)
      else middleRow (first + cycleLen) (second + cycleLen) (second : first : acc)

    middleRows :: [[Int]]
    middleRows = map (\i -> middleRow i (cycleLen - i) []) [1..numRows-2]

    allRows :: [[Int]]
    allRows = firstRow : middleRows <> [lastRow]

    ...
Now we bring it all together with one final step. We make a vector from our input, and define a function to turn a single int-list into a single String. Then at the top level of our function (the original else branch), we use concatMap to bring these together into our final result String.
zigzagConversion :: String -> Int -> String
zigzagConversion input numRows = if numRows == 1 then input
  else concatMap rowIndicesToString  allRows
  where
    n = length input
    cycleLen = 2 * numRows - 2

    firstRow :: [Int]
    firstRow = [0,cycleLen..n - 1]

    lastRow :: [Int]
    lastRow = [numRows - 1, numRows - 1 + cycleLen..n - 1]

    middleRow :: Int -> Int -> [Int] -> [Int]
    middleRow first second acc = if first >= n then reverse acc
      else if second >= n then reverse (first : acc)
      else middleRow (first + cycleLen) (second + cycleLen) (second : first : acc)

    middleRows :: [[Int]]
    middleRows = map (\i -> middleRow i (cycleLen - i) []) [1..numRows-2]

    allRows :: [[Int]]
    allRows = firstRow : middleRows <> [lastRow]

    inputV :: V.Vector Char
    inputV = V.fromList input

    rowIndicesToString :: [Int] -> String
    rowIndicesToString = map (inputV V.!)
Conclusion

This comparison once again showed how while loops in Rust track with recursive functions in Haskell. We also saw some nifty Haskell features like ranges and tail recursion. Most of all, we saw that even with a trickier algorithm, we can still keep the same basic shape of our algorithm in a functional or imperative style.

To learn more about these problem solving concepts, take a look at Solve.hs, our comprehensive course on problem solving in Haskell. You’ll learn about recursion, list manipulation, data structures, graph algorithms, and so much more!
by James Bowen at June 09, 2025 08:30 AM

June 03, 2025

Well-Typed.Com

Funding the Haskell toolchain with Ecosystem Support Packages

Well-Typed are delighted to be working in partnership with the Haskell Foundation to offer Haskell Ecosystem Support Packages, which include:

Investment in open-source maintenance work for core Haskell tools (GHC, Cabal, HLS, …)

Access to Well-Typed’s team of Haskell experts for private development or technical support

Funding the Haskell Foundation to sustain key community infrastructure¹

Acknowledgement of the sponsor’s contribution to the Haskell community

We would like to gratefully acknowledge our first Haskell Ecosystem Supporters:

Channable

QBayLogic

They, along with our other clients who contribute to funding our open-source work, are making a real difference to Haskell’s sustainability. If your company uses Haskell and is not already funding its development, why not join them?

Introduction

The Haskell language has just turned 35 years old, and the vibrant Haskell community continues to thrive thanks to the efforts of dedicated volunteers across many different roles and open source projects. Haskell has always been a collective, community-driven project without a controlling corporate sponsor, which has many benefits. But it leads to challenges funding some activities that are crucial to the sustainability of the ecosystem:

Maintenance of core parts of the Haskell toolchain: large, complex open-source projects need full-time paid maintainers to function effectively and avoid the “Nebraska problem”.

Community-building and infrastructure support: from running servers for crucial services, to coordinating volunteers and assisting open-source projects, to organizing events.

Well-Typed has a well-established track record maintaining GHC and other core Haskell tools alongside our commercial consulting work; read more in our GHC activities reports. The Haskell Foundation undertakes a wide range of activities to support the community and broaden the adoption of Haskell.

Both Well-Typed and the Haskell Foundation rely on funding from commercial users of Haskell to make this possible. Thus we are happy to be working in partnership to make it easy for companies to contribute back and support all this important work, while receiving tangible benefits themselves. Haskell Ecosystem Supporter companies make a single contract and in return get support with their specific needs, work to sustain the Haskell tools on which they rely, and sponsor the Haskell Foundation. You can find out more about the approach we’ve agreed in the Foundation’s announcement of Ecosystem Partnerships.

Well-Typed have been keen supporters and sponsors of the Haskell Foundation since it was established five years ago, and Andres Löh from Well-Typed is currently Chair of the Foundation Board. We would be delighted to see other companies supporting the Haskell Foundation and the wider Haskell community through similar partnerships.

Our offer

Well-Typed offer multiple tiers of support, to meet the needs of clients of different sizes:

Tier Bronze Silver Gold

Open-source maintenance for core Haskell toolchain ✅ ✅✅ ✅✅✅

Support for Haskell Foundation ✅ ✅✅ ✅✅✅

Engineering meetings Quarterly Monthly Fortnightly

Technical support channel - Email Chat

Developer time included 10 hours/month 1/5 FTE 1/2 FTE

Time reserved for private support or development - 4 hours/month 16 hours/month

Publicity Name Small logo + link Medium logo + link

Monthly (ex VAT) $1,500 $3,850 $11,000

Annual (ex VAT) $16,500 $42,000 $120,000

Next steps

If this sounds like something you or your company would be interested in, and you would like to help support the Haskell ecosystem, then:

Read about our offer in more detail

Talk to us in person at ZuriHac

Contact us by email to [email protected]

More specifically: for these packages, 20% of the purchase price will be donated to Haskell.org for the Haskell Foundation.↩︎

by adam at June 03, 2025 12:00 AM

Tier	Bronze	Silver	Gold
Open-source maintenance for core Haskell toolchain	✅	✅✅	✅✅✅
Support for Haskell Foundation	✅	✅✅	✅✅✅
Engineering meetings	Quarterly	Monthly	Fortnightly
Technical support channel	-	Email	Chat
Developer time included	10 hours/month	1/5 FTE	1/2 FTE
Time reserved for private support or development	-	4 hours/month	16 hours/month
Publicity	Name	Small logo + link	Medium logo + link
Monthly (ex VAT)	$1,500	$3,850	$11,000
Annual (ex VAT)	$16,500	$42,000	$120,000

June 02, 2025

Edward Z. Yang

Vibe coding case study: ScubaDuck
A lot of strong engineers that I know haven't really taken a serious look at AI coding; they've used LLMs to ask questions or write simple scripts and appreciate that it is a useful tool, but haven't actually tried building a nontrivial application entirely from scratch in vibe coding style (here, I use the term in its original meaning: when you do AI coding without carefully reviewing the output). This is understandable: if you're not working on a green field project, there aren't that many opportunities to write code in this style--standard practice for established projects is that someone else needs to review all of the code you write: this is a bad match for vibe coding! So in this post, I want to give a concrete case study of a nontrivial system that was entirely vibe coded (ScubaDuck), to argue the following claims:

AI coding can be done on a manager's schedule: you don't need continuous blocks of coding time and context-switching is considerably less harmful. ScubaDuck was implemented in three days of part time work, where all of the work happened when the baby was napping.

AI coding substantially lowers the cost of doing projects in tech stacks you are less familiar with. ScubaDuck is mostly JavaScript UI code, which is not something I write on a day-to-day basis.

AI coding is an unlock for "sidequests": support software that's ancillary to your main task that is nice to have, but not essential. If previously you would have decided the cost outweighed the benefit, AI coding reducing the cost means you should redo these calculations.

Vibe coding works and can produce working software. ScubaDuck is an existence proof that vibe coding is a viable strategy for generating JavaScript UI code (NB: I don't claim vibe coding will work for all domains, nor do I claim this is the only domain for it works. Hopefully you can also build some intuition for where it is more or less likely to work). You will not one shot it (ScubaDuck was 150 prompts in the end) but if you are prompting the LLM to also generate tests, you can reliably fix issues without causing regressions to existing code.

Vibe coding is good for situations where buggy software is low impact; be on the lookout for ways to engineer this sort of situation. ScubaDuck is a read-only interface, where the only downside to being buggy is you can't issue the queries you want to issue.

Update: You can see all of my prompts and the resulting agent trajectories at scubaduck-prompts.

What is ScubaDuck?

ScubaDuck is a discount implementation of Meta's internal Scuba realtime database system. You can read more about what exactly this is on GitHub, but it's not so important for the purposes of this post: the key details you need to know about ScubaDuck is that it consists of a Python server that exposes an API to perform queries against a DuckDB database, and an HTML and JavaScript frontend application which implements the forms for building these queries and rendering of the output data. Both the forms and output data rendering have nontrivial JavaScript enhancements: some form inputs are chip inputs and support autocomplete, and the time series view is an SVG chart. All of these components were coded from scratch, so the project has no third-party JavaScript dependencies.

So on the one hand, this project is pretty simple. There are no stringent performance or uptime requirements, it's a pretty standard server-client program that the LLM has seen millions of times before (this is good!) On the other hand, the exact behavior of the frontend UI is quite intricate and would be very difficult to one-shot in a single prompt. Indeed, as I was coding and testing the application, I frequently ran into situations that I didn't anticipate in my original specification, and that I had to ask Codex to refine. Another way to put it is that ScubaDuck is a relatively simple functional specification (although this too was not one shot), but I did a lot of polishing of small behaviors so that the interface behaved in the way that I expected Scuba to behave. Here, it was helpful that I had a very clear idea of what I wanted (since I've used Scuba quite a lot at work).

Going into ScubaDuck, I had a pretty good sense that this project should be a good fit for LLMs. HTML, JavaScript and Python are all extremely high resource languages, and I'd heard lots of people raving about how good LLMs were at transforming wireframes and mockups into fully functional websites. It is also fully self contained and straightforward-ish to test (only "ish" because you do have to use something like Playwright to actually test the frontend UI, which honestly is a slog. But fortunately, the LLM can write the tests for you!) One design decision I made, which I didn't originally anticipate but worked out in the end, was the decision to not use any third-party JavaScript libraries. This was by accident: Python has no native of bundling third party JavaScript, but I wanted the tool to work offline. I wasn't sure if you could vibe code an SVG charting library from scratch, but apparently you can and it's quite easy!

Agent setup

ScubaDuck was implemented with OpenAI Codex in the cloud (not the CLI tool). Codex's cloud offering requires you to initialize a hermetic environment which the coding agent can execute commands in. It's pretty well known now that AI coding agents work much better if they are able to run the code they write and see if it worked or not, so this is quite an important part of the process. Unfortunately, this was somewhat time consuming trial and error to setup. I had a fairly detailed initial prompt, and what I would do was submit it to Codex, watch it fail, read over the trajectory (the agent logs) to see what happened (Codex wanted to use npm! Codex couldn't download something from the internet! Codex tried to use a package that wasn't available!) and then fixed whatever environment misconfiguration had caused it to fail, or edited AGENTS.md to instruct it to not do some behavior. According to my history, the first day of the project was spent unsuccessfully trying to get the project setup, and my first successful Codex PR only happened on May 19.

At the end of setup, I had the following:

A pyproject.toml with exactly the dependencies I wanted to be used (duckdb, flask and python-dateutil), a lockfile for it (since I was using uv) and my preferred configuration for various tools (pytest, ruff). I'm a big fan of pytest-xdist for vibe coded projects, since you can prompt the LLM to write tests that will work when run in parallel and it does a pretty good job at this. Later I'd also add a pyright configuration, though initially I left it out because I saw Codex doing some strange things on account of duckdb being untyped, and I didn't want to debug it at the time (the fix, by the way, is instructing the LLM to define stubs as necessary in this case.)

An AGENTS.md file with some basic instructions to try to get Codex to stop doing things I saw it doing in the initial trajectories that I didn't want it to do. Nothing fancy, just if you see Codex do something bad, tell it not to do it in AGENTS.md. A good example of this is the "There are no nested AGENTS.md files, this is the only agents file": Codex is post-trained to look for nested AGENTS.md files, but you can save a few tool calls if you tell it there aren't any. (Note: folklore for Claude 3.7 is that instruction following for this sort of rules following was not great. Word on the street is that both Codex and Claude 4 are substantially better at this. Extra note: For uv users, another notable instruction in AGENTS.md is how to activate the venv, since at time of writing I couldn't get Codex to make this happen automatically.)

A setup script for the environment. This took the most debugging, because Codex runs all Internet access through a proxy and sometimes it works imperfectly.

After I got my initial prompt to generate a first draft of the application, I was able to begin vibe coding in earnest.
The Human-Agent loop

The basic vibe coding loop works like this:

Interact with the application and find things that are broken

Prompt the LLM to fix them

Repeat

For example, after the very first PR, some very mild poking around immediately revealed the bugs fixed in #2:

There's a race condition in the current test logic for matching against table contents in run_query. Specifically, if there were previously valid results in lastResults, and for some reason Dive doesn't do anything, then we will still see the old results. The testing framework should explicitly clear lastResults before attempting an interaction.

...and #3:

Filter functionality does not work. We will first add a failing test, and then fix it. The failing test should click "Add Filter", then select "user" as the field, and then add an "alice" chip (by typing alice in the text box and pressing ENTER). Then when we dive, we should see two alice rows. Right now, NO request is issued at all when we click Dive. Diagnose and then fix the problem.

Prompt the agent to write tests. It's very helpful to prompt the agent to generate tests for whatever bugs its fixing. For frontend code, I decided to use playwright to write these tests. An example in #11:
def test_header_and_tabs(page: Any, server_url: str) -> None:
    page.goto(server_url)
    page.wait_for_selector("#order_by option", state="attached")

    header = page.text_content("#header")
    assert "sample.csv" in header
    assert "events" in header

    assert page.is_visible("#settings")
    assert page.is_hidden("#columns")
    page.click("text=Columns")
    assert page.is_visible("#columns")
    cols = page.locator("#column_list li").all_inner_texts()
    assert "timestamp" in cols
    assert "event" in cols
    page.click("text=View Settings")
    assert page.is_visible("#settings")

    btn_color = page.evaluate("getComputedStyle(document.querySelector('#dive')).backgroundColor")
    assert "rgb(0, 128, 0)" == btn_color

    sidebar_overflow = page.evaluate("getComputedStyle(document.querySelector('#sidebar')).overflowY")
    view_overflow = page.evaluate("getComputedStyle(document.querySelector('#view')).overflowY")
    assert sidebar_overflow == 'auto'
    assert view_overflow == 'auto'
This is kind of a very manual and somewhat brittle test. I doubt I would have had the patience to manually write and debug this. But the LLM will happily do it. There is also a fear that the test might not actually be testing anything for real. But because these tests are all generated from bug reports, you can prompt the model to write the test first, check that it's failing, and then fix the problem. I didn't instruct this in the above prompt, but I did instruct it for #155, for example. The trajectory shows that the LLM demonstrates it can repro the bug before fixing the issue:
I’ll add a case-insensitive test for "samples" to check server behavior with lowercase input.
(scubaduck) root@5b69743466e4:/workspace/scubaduck# python - <<'EOF'
> import json
> from scubaduck import server
>
> app = server.app
> client = app.test_client()
> payload = {
>     "table": "events",
>     "start": "2024-01-01 00:00:00",
>     "end": "2024-01-03 00:00:00",
>     "graph_type": "table",
>     "order_by": "samples",  # lowercase
>     "order_dir": "DESC",
>     "limit": 10,
>     "columns": [],
>     "group_by": ["user"],
> }
> rv = client.post("/api/query", data=json.dumps(payload), content_type="application/json")
> print('status', rv.status_code)
> print('data', rv.get_json())
> EOF
status 400
data {'error': 'Unknown column: samples'}
(scubaduck) root@5b69743466e4:/workspace/scubaduck#
I’ve found the bug happens when "Samples" is lowercase. I’ll adjust the query to handle case insensitivity and add a test for it. Time to modify server.py to fix the issue!
Beware BC-breaking changes. The LLM generated tests worked well when I didn't need to change behavior in a BC-breaking way. Example prompt:

Change time column to default to none unconditionally. (I want to default it to a time column if we find a good candidate but this threshold needs tuning and I don't have the tuning right now.)

This hit a Codex timeout:

This attempt to update the default time column led to extensive frontend test failures that require deeper changes. The environment made it difficult to fully diagnose the Playwright test behaviors within the time available, so the work could not be completed.

In standard software engineering practice, when this happens, decouple the BC compatible and BC breaking changes!

Make it so that Time Column can be set to (none). When it is set this way, the Start/End fields are hidden and we don't apply a filter on time range. (#115)

and then later instead of defaulting the time column to none, I added a heuristic to pick a column that looked like time, which picked the same column that all of the existing tests had also expected to be called with.

Refactors have to be split up. Codex's timeout means that you can't ask it to do too much in one go. Here's a prompt that timed out:

scubaduck/index.html has gotten a bit long. Let's split out some of the JS code into dedicated JS files for their functionality. Also setup the necessary Flask scaffolding to serve these JS files. I think splitting out these specific components would be good:

Dropdown implementation

Sidebar resizing

JS controlling the View Settings (e.g., updateDisplayTypeUI, as well as one off interactions on form elements, columns handling, filter handling, the actual Dive implementation (including query updating), reading in defaults from query string)

Table rendering (e.g., formatNumber, sorting)

Chip input implementation

Chart rendering (showTimeSeries)

Make changes to AGENTS.md or README.md describing the structure so you can quickly find where the components you need are

I eventually did manage the refactor by prompting Codex to individually move out the pieces I wanted to extract one-by-one. This is a place where I think Claude Code probably would have performed better.

Parallelizing tasks. As you can see from the lengths of my prompts, it does take a while to write a good prompt; you're basically writing a bug report with enough detail that the LLM can repro it and then fix it. So sometimes I would be bottlenecked on prompt writing. However, sometimes the prompts were quite short. In those cases, Codex encourages you to submit more tasks that can run in parallel. I found this worked well, and I'd sometimes have as many as five instances going (once again, rate limited by discovering problems, making designs and typing prompts!) One irritation is when the tasks end up conflicting with each other. Sometimes the conflicts are easy to fix, but if it feels nontrivial, it's often better to just ask Codex to redo one of the PRs on latest main after the other has landed. To avoid merge conflicts, it helps to have only one "main feature" agent going at any time, and then ask the agent to do random bugfixes in parallel with it. Once you have no more tasks to get running, you can go do something else while you wait for the agents to finish (manager schedule!)
Prompting

As a reminder, I've posted all of my prompts (including the ones that failed) at scubaduck-prompts, and I think it's helpful to skim through them to get a flavor of what I was asking the LLM. But to summarize, what did I spend most of my time on prompting Codex to do? My general vibe (ahem) is that I spent most of my time doing minor enhancements, where I instructed Codex to make some part of the program work slightly differently, in a way that was previously unspecified from the previous prompt. The metaphor I had in my head while I was working on the project was like that of a sculptor chiseling away marble: in the beginning, anything is possible, but as I kept prompting, I continuously narrowed down the space of possible programs I had until I had exactly the one I wanted. One big thing I want to note is that Codex rarely needed to make updates to my tests; for the most part, tests that were added never got taken away, because I never "changed my mind". I suspect that the vibe coding process would have been rockier if I was having to change behavior frequently.

One of the things that surprised me the most about the process was how easy it was to implement a line chart in SVG with Codex. My first prompt resulted in a chart that looked broken on the test data:
We're going to add a new View type, to go along with Samples and Table: Time Series. Time Series supports all the fields that Table supports, and a few more:

X-axis: Main group by dimension, e.g., the x-axis on time series view. This is our custom dropdown selector, but only time columns are populated here. It should prefer a default setting from the following list, most preferred first: "time", "timestamp"

Granularity: Choose the time interval between data points on the chart. For example, a granularity of 1 hour means there will be a data point every 60 minutes that is aggregated with the chosen Aggregate function over the data for the granularity period before point. This is a plain drop down. The valid values are: Auto, Fine, 1 second, 5 seconds, 10 seconds, 30 seconds, 1 minute, 4 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, 1 day, 1 week, 30 days. The semantics of the Auto setting is that it sets the interval to whatever would result in maximum 100 buckets (if there are not enough data points for that many buckets, it just picks the finest time interval that makes sense), and Fine which sets the interval to 500 buckets.

Fill Missing Buckets: This is a dropdown. For now, it has the settings "Fill with 0 (Per Series)" (default), "Connect (Per Series)" and "Leave blank".

Additionally, the default setting of Limit is 7, as it controls how many elements from group by will be plotted (the actual number of lines plotted could be a multiple of this, as we will plot every selected Column).

Unlike Samples and Table, we will instead display a line chart in the right panel. To plot the line chart, we will implement it by hand with JS and SVG, similar to how highcharts implements it. We will not use any third party dependencies. Lines will be plotted as paths, no smoothing, no dots for individual data points. Each series (as generated by group by) should be plotted with a different color, assigned using a best practices color palette for graph design. There should be a rendering of x-axis and y-axis; the x-axis should have slanted labels to aid readability. When we mouse over the chart, a vertical line should snap to the center of the time bucket that we are closest to. We should also display a crosshair on all of the series showing us their values at that data point, and highlight the closest point we are on, and increase the thickness of the series that point is on. To the left of the graph (still in the right panel), there should be a legend. The legend looks like this:
[GROUP BY VALUE] [AGGREGATE]
[First Column name, with series color]
[Number of samples for the first column]
[Second Column name, with series color]
[Number of samples for the second column]
... for all columns
----
... for all group by values (up to the limit)
So for example, if I group by user, I might see:
Alice AVG
value
4 (samples)
The highlighted series (which has a thicker line) should also be highlighted in the legend).
This was kind of terrifying, because I initially thought I didn't have a good way to test the SVG outputs. But after doing some regular old-fashioned debugging and reading the code (yes, this part not vibe coded), I figured out the problem, and also realized that Playwright can test that an SVG path is not just entirely straight. After the initial bugs were fixed, I mostly had to add missing features like x-axis/y-axis and interactivity features (amusingly, Codex ignored most of the instructions in the latter half of the prompt, giving only the barest bones legend. I suspect this was because I had some files which were too long). My general take after this was that JS chart libraries are going to become obsolete: it's much easier to vibe code a bespoke implementation and then customize the heck out of it.
Conclusion

ScubaDuck was implemented in about 150 Codex prompts. As you can see from the sample prompts above, the prompts are recognizably programming, they just happen to be in plain English language. This is a big help, because I never had to keep track of the nest of callbacks and state machines for implementing complex UI elements in JavaScript. I had to be fluent in what I wanted my program to do, and a good QA tester for the application to discover new problems that needed to be fixed, but I did not have to worry at all about the vagaries of SVG DOM elements or pixel position computation minutiae. It's hard to say how long it would have taken to code this by hand, but I think reproducing a UI that's been in production for years at Meta in three (part-time) days is pretty good!

Despite having done a bit of AI coding before, I also learned a bit from working on Codex. Codex made it blindingly clear that the parallel modality (and subsequent conflict resolution) is important. It made me adjust up my estimation of the capability of LLMs to write raw HTML/JS and evoked a future where people vibe code components in place of taking on a third party dependency. I was very appreciative of no rate limit Codex (though I doubt it's going to last.) It also reminded me how difficult it will be to setup agent environments for "real" projects (like PyTorch).

Hopefully, this case study has given you some ideas for things to try. Go forth and vibe code, responsibly!
by Edward Z. Yang at June 02, 2025 04:31 AM

Chris Penner

Building Industrial Strength Software without Unit Tests
I don't know about you, but testing isn't my favourite part of software development.

It's usually the last thing standing between me and shipping a shiny new feature, and writing tests is often an annoying process with a lot of boilerplate and fighting against your system to get your app into a good start starting for the test or mocking out whichever services your app depends on.

Much ink has been spilled about how to organize your code in order to make this easier, but the fact that so many blog posts and frameworks exist for this express purpose suggests to me that we as a community of software developers haven't quite solved this issue yet.

Keep reading to see how I've solved this problem for myself by simply avoiding unit testing altogether.

An alternative testing method

When I first started at Unison Computing I was submitting my first feature when I learned there were precious few unit tests. I found it rather surprising for a codebase for a compiler for a programming language! How do you prevent regressions without unit tests?

The answer is what the Unison team has dubbed transcript tests. These are a variation on the concept of golden-file tests.

A Unison transcript is a markdown file which explains in standard what behaviour it is going to test, then intersperses code-blocks which outline the steps involved in testing that feature using a mix of Unison code and UCM commands (UCM is Unison's CLI tool). After that comes the magic trick; UCM itself can understand and run these transcript files directly and record the results of each block.

When running a transcript file with the ucm transcript command UCM produces a deterministic output file containing the result of processing each code block. Unless the behaviour of UCM has changed since the last time it was run the resulting file will always be the same.

Each block in the markdown file is either a command, which is sent to the UCM shell tool, or it represents an update to a file on the (virtual) file-system, in which case it will be typechecked against the state of the codebase.

Here's a quick example of a transcript for testing UCM's view command so you can get a feel for it.
# Testing the `view` command

First, let's write a simple definition to view:

``` unison
isZero = cases
  0 -> true
  _ -> false
```

Now we add the definition to the codebase, and view it.

``` ucm
scratch/main> update
scratch/main> view isZero
```
We run this transcript file with ucm transcript my-transcript.md which produces the my-transcript.output.md file.

Notice how compiler output is added inline, ignore the hashed names, It's because I'm skipping the step which adds names for Unison's builtins.
# Testing the `view` command

First, let's write a simple definition to view:

``` unison
isZero = cases
  0 -> true
  _ -> false
```

``` ucm :added-by-ucm
  Loading changes detected in scratch.u.

  I found and typechecked these definitions in scratch.u. If you
  do an `add` or `update`, here's how your codebase would
  change:

    â�Ÿ These new definitions are ok to `add`:
    
      isZero : ##Nat -> ##Boolean
```

Now we add the definition to the codebase, and view it.

``` ucm
scratch/main> update

  Done.

scratch/main> view isZero

  isZero : ##Nat -> ##Boolean
  isZero = cases
    0 -> true
    _ -> false
```
Feel free to browse through the collection of transcripts we test in CI to keep UCM working as expected.

Testing in CI

Running transcript tests in CI is pretty trivial; we discover all markdown files within our transcript directory and run them all. After the outputs have been written we can use git diff --exit-code which will then fail with a non-zero code if anything of the outputs have changed from what was committed. Conveniently, git will also report exactly what changed, and what the old output was.

This failure method allows the developer to know exactly which file has unexpected behaviour so they can easily re-run that file or recreate the state in their own codebase if they desire.

Transcript tests in other domains

I liked the transcript tests in UCM so much that when I was tasked with building out the Unison Share webapp I decided to use transcript-style testing for that too. Fast forward a few years and Unison Share is now a fully-featured package repository and code collaboration platform running in production without a single unit test.

If you're interested in how I've adapted transcript tests to work well for a webapp, I'll leave a few notes at the end of the post.

Benefits of transcript tests

Here's a shortlist of benefits I've found working with transcript tests over alternatives like unit tests.

You write a transcript using the same syntax as you'd interact with UCM itself.

This allows all your users to codify any buggy behaviour they've encountered into a deterministic transcript. Knowing exactly how to reproduce the behaviour your users are seeing is a huge boon, and having a single standardized format for accepting bug reports helps reduce a lot of the mental work that usually goes into reproducing bug reports from a variety of sources. This also means that the bug report itself can go directly into the test suite if we so desire.

All tests are written against the tool's external interface.

The tests use the same interface that the users of your software will employ, which means that internal refactors won't ever break tests unless there's a change in behaviour that's externally observable.

This has been a huge benefit for me personally. I'd often find myself hesitant to re-work code because I knew that at the end I'd be rewriting thousands of lines of tests. If you always have to rewrite your tests at the same time you've rewritten your code, how do you have any confidence that the tests still work as intended?

Updating tests is trivial

In the common case where transcripts are mismatched because some help message was altered, or perhaps the behaviour has changed but the change is intended, you don't need to rewrite any complex assertions, or mock out any new dependencies. You can simply look at the new output, and if it's reasonable you commit the changed transcript output files.

It can't be understated how convenient this is when making sweeping changes; e.g. making changes to Unison's pretty printer. We don't need to manually update test-cases, we just run the transcripts locally and commit the output if it all looks good!

Transcript changes appear in PR reviews

Since all transcript outputs are committed, any change in behaviour will show up in the PR diff in an easy-to-read form. This allows reviewers to trivially see the old and new behaviour for each relevant feature.

Transcript tests are documentation

Each transcript shows how a feature is intended to be used by end-users.

Transcripts as a collaboration tool

When I'm implementing new features in Unison Share I need to communicate the shape of a JSON API with our Frontend designer Simon. Typically I'll just write a transcript test which exercises all possible variants of the new feature, then I can just point at the transcript output as the interface for those APIs.

It's beneficial for both of us since I don't need to keep an example up-to-date for him, and he knows that the output is actually accurate since it's generated from an execution of the service itself.

Transcript testing for Webapps

I've adapted transcript testing a bit for the Unison Share webapp. I run the standard Share executable locally with its dependencies mocked out via docker-compose. I've got a SQL file which resets the database with a known set of test fixtures, then use a zsh script to reset my application state in between running each transcript.

Each transcript file is just a zsh script that interacts with the running server using a few bash functions which wrap curl commands, but save the output to json files, which serve as the transcript output.

I've also got helpers for capturing specific fields from an API call into local variables which I can then interpolate into future queries, this is handy if you need to, for example, create a project then switch it from private to public, then fetch that project via API.

Here's a small snippet from one of my transcripts for testing Unison Share's project APIs:
#!/usr/bin/env zsh

# Fail the transcript if any command fails
set -e

# Load utility functions and variables for user credentials
source "../../transcript_helpers.sh"

# Run a UCM transcript to upload some code to load in projects.
transcript_ucm transcript prelude.md

# I should be able to see the fixture project as an unauthenticated user.
fetch "$unauthenticated_user" GET project-get-simple '/users/test/projects/publictestproject'

# I should be able to create a new project as an authenticated user.
fetch "$transcripts_user" POST project-create '/users/transcripts/projects/containers' '{
    "summary": "This is my project",
    "visibility": "private",
    "tags": []
}'

fetch "$transcripts_user" GET project-list '/users/transcripts/projects'
You can see the output files generated by the full transcript in this directory.

Requirements of a good transcript testing tool

After working with two different transcript testing tools across two different apps I've got a few criteria for what makes a good transcript testing tool, if you're thinking of adding transcript tests to your app consider the following:

Transcripts should be deterministic

This is critical. Transcripts are only useful if they produce the same result on every run, on every operating system, at every time of day.

You may need to make a few changes in your app to adapt or remove randomness, at least when in the context of a transcript test.

In Share there were a lot of timestamps, random IDs, and JWTs (which contain a timestamp). The actual values of these weren't important for the tests themselves, so I solved the issue by piping the curl output through a sed script before writing to disk. The script matches timestamps, UUIDs, and JWTs and replaces them with placeholders like <TIMESTAMP>, <UUID>, and <JWT> accordingly.

A special mode in your app for transcript testing which avoids randomness can be useful, but use custom modes sparingly lest your app's behaviour differ too much during transcripts and you can't test the real thing.

I also make sure that the data returned by APIs is always sorted by something other than randomized IDs, it's a small price to pay, and reduces randomness and heisenbugs in the app as a helpful byproduct.

Transcripts should be isolated

Each individual transcript should be run in its own pristine environment. Databases should be reset to known state, if the file-system is used, it should be cleared or even better, a virtual file-system should be used.

Transcripts should be self-contained

Everything that pertains to a given test-case's state or configuration should be evident from within the transcript file itself. I've found that changes in behaviour from the file's location or name can just end up being confusing.

Difficulties working with Transcripts

Transcripts often require custom tooling

In UCM's case the transcript tooling has evolved slowly over many years, it has it's own parser, and you can even test UCM's API server by using special code blocks for that.

Share has a variety of zsh utility scripts which provide helpers for fetching endpoints using curl, and filtering output to capture data for future calls. It also has a few tools for making database calls and assertions.

Don't shy away from investing a bit of time into making transcript testing sustainable and pleasant, it will pay dividends down the road.

Intensive Setup**

As opposed to unit tests which are generally pretty lightweight; transcript tests are full integration tests, and require setting up data, and sometimes executing entire flows so that we can get the system into a good state for testing each feature.

You can mitigate the setup time by testing multiple features with each transcript.

I haven't personally found transcript tests to take too much time in CI, largely because I think transcript testing tends to produce fewer tests, but of higher value than unit testing. I've seen many unit test suites bogged down by particular unit tests which generate hundreds of test cases that aren't actually providing real value. Also, any setup/teardown is going to be more costly on thousands of unit-tests as compared to dozens or hundreds of transcript tests.

Service Mocking

Since transcript tests run against the system-under-test's external interface, you won't have traditional mocking/stubbing frameworks available to you. Instead, you'll mock out the system's dependencies by specifying custom services using environment variables, or wiring things up in docker-compose.

Most systems have a setup for local development anyways, so integrating transcript tests against it has the added benefit that they'll ensure your local development setup is tested in CI, is consistent for all members of your team, and continues to work as expected.

In Summary

Hopefully this post has helped you to consider your relationship with unit tests and perhaps think about whether other testing techniques may work better for your app.

Transcript tests surely aren't ideal for all possible apps or teams, but my last few years at Unison have proven to me that tests can be more helpful, efficient, and readable than I'd previously thought possible.

Let me know how it works out for you!

Hopefully you learned something ğŸ¤�! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! ğŸ�»
June 02, 2025 12:00 AM

May 30, 2025

Haskell Interlude

65: Andy Gordon

Andy Gordon from Cogna is interviewed by Sam and Matti. We learn about Andyâ€™s influential work including the origins of the bind symbol in haskell, and the introduction of lambdas in Excel. We go onto discuss his current work at Cogna on using AI to allow non-programmers to write apps using natural language. We delve deeper into the ethics of AI and consider the most likely AI apocalypse.

by Haskell Podcast at May 30, 2025 02:00 PM

May 29, 2025

Tweag I/O

Quantifying the impact of an AI tool in our company

As software engineers at Modus Create, we are always on the lookout for tools that can enhance our productivity and code quality. The advent of AI-powered coding assistants such as GitHub Copilot has sparked excitement in the development community. Copilot code completions propose snippets at the current cursor that the user can quickly insert, while Copilot Chat allows users to discuss their code with an AI.

These tools promise to revolutionize software development, allowing engineers to focus on higher-level tasks while delegating implementation details to machines. However, their adoption also raises questions:

Do they genuinely improve developer productivity?

How do they affect code quality and maintainability?

Which users and tasks benefit the most from these AI-driven coding assistants?

This blog post explores the challenges of measuring the impact of AI tools in our software engineering practices, with a focus on GitHub Copilot. Note that the data discussed in the post was collected in Q2 2024. We expect that GitHub Copilot has improved since then; we have also not yet had the opportunity to quantitatively investigate newer interfaces to AI development, like Cursor or Windsurf.

“Developer Productivity”

At Modus Create, we’re passionate about improving the experience of developers, both for our own teams and those at clients. We have been working for years on tools that we think improve developer productivity, for instance with Nix, Bazel, Python, and many more. But measuring developer productivity is a notoriously difficult task.

At the heart of this question lies the nature of software development itself. Is it a productive activity that can fit scientific management, be objectively measured, and be optimized? Part of the research on developer productivity goes down this path, trying to measure things like the time it takes to complete standardized tasks. Another trend suggests that developers themselves can be their own assessors of productivity, where frameworks like SPACE are used to guide self-assessment. Each of these angles has strengths and weaknesses. To get as broad a picture as possible, we tried to use a bit of both. We found, though, that data collection issues made our task timings unusable (more on this below). Therefore, all our conclusions are drawn from self-assessments.

Our in-house experiment

To gain a deeper understanding of the impact of GitHub Copilot at Modus Create, we designed and conducted an in-house experiment.

We managed to recruit 22 participants in total, ranging from Junior to Principal software engineers. They had a wide range of programming experience.

The experiment consisted of four coding tasks that participants needed to complete using Python within an existing codebase. The tasks were designed to evaluate different aspects of software development:

Data ingestion: Loading and parsing data from a file into a Pandas DataFrame

Data analysis: Performing statistical computations and aggregations using Pandas’ groupby operations

Test development: Writing tests using Python’s unittest framework

Data visualization: Creating interactive plots using the Streamlit library

Participants had varied levels of experience with the required tools. Most participants had at least a little bit of Python experience, but Pandas experience was less common and hardly anyone had used Streamlit before.¹

Upon completion of the assigned tasks, all participants completed a comprehensive survey to provide detailed feedback on their experience. The survey consisted of approximately 50 questions designed to assess multiple dimensions of the development process, including:

Assessment of participant expertise levels regarding task requirements, AI tooling and GitHub Copilot proficiency

Evaluation of task-specific perceived productivity

Analysis of the impact on learning and knowledge acquisition

Insights into potential future GitHub Copilot adoption

Perceived productivity gains

We asked participants the following questions.

Question Choices

If you didn't have Copilot, reaching
the answer for task X would have taken...

Less time

About the same time

More time

This question was core to our study, as it allowed us to directly measure the perceived productivity gain of using Copilot versus not using it.

The result was clear: almost every Copilot user felt more productive using Copilot on every task.

We also broke out the same data by Python experience level, and found that more experienced Python users found less productivity gain than less experienced users. In this plot, we grouped the “no Python experience” and “beginner” users into the “less experienced” group, with the rest of the users in the “more experienced group”.

To better understand how participants tackled these tasks, we collected information by asking for each task:

Question Choices

Which of the following have you
used to complete task X?

Copilot code completions

Copilot Chat

Google search

Library documentation

My knowledge

We were also interested in comparing these usages across profiles of developers, so we asked this question as well:

Question Choices

How would you describe your Python level?

No Python experience

Beginner

Intermediate

Advanced

We could then visualize how participants who felt more productive with Copilot solved each problem, and see if there were variations depending on their profile. Since each participant could choose multiple options, sometimes there are more responses than participants.

Apparently, people don’t like library documentation. Also, we thought it was strange that the most experienced Python users never reported using their own knowledge. It would be interesting to dig more into this, but we don’t have the data available. One theory is that when reviewing AI suggestions everyone relied on their own Python knowledge, but experienced users took that knowledge for granted and so didn’t report using it.

Among people who felt more productive on tasks “Write unit tests” and “Plot with Streamlit”, we really see more usage of Copilot Chat than other sources.

Our hypothesis is that these tasks typically require making more global changes to the code or adding code in places that are not obvious at first. In these scenarios, Copilot Chat is more useful because it will tell you where and what code to add. In other tasks, it was clearer where to add code, so participants could likely place their cursor and prompt Copilot for a suggestion.

This is supported by the questions we asked:

Question Choices

Which of the following do you think is true?

Copilot is better with acceleration:
it helps accelerate work that I already know how to do.

Copilot is better with exploration:
it helps me explore the problem and how to solve it when I am not sure how to proceed.

This question uses checkboxes, so respondents were not restricted to a single answer.

On average, participants thought Copilot was suited for both acceleration and exploration, but with some notable differences depending on experience level: experienced Pythonistas strongly favored Copilot for acceleration, while less experienced users thought it was better for exploration.

We also found that the participants’ perspective on acceleration versus exploration seems related to the usage of Copilot Chat.

The most interesting part of this chart is that participants who think Copilot is good for exploration or bad for acceleration relied most heavily on Copilot Chat. This suggests that users find the autocomplete features more useful for acceleration, while the chat features — which allow general questions, divorced from a specific code location — are useful for exploration. But it is interesting to note how usage of Copilot Chat versus autocomplete is correlated with how users perceive Copilot as a whole.

For more on acceleration versus exploration with Copilot, this OOPSLA23 talk which inspired to ask this question is worth watching.

Copilot will make code flow

The SPACE framework mentions “flow” as an important aspect of productivity.

Some research associates productivity with the ability to get complex tasks done with minimal distractions or interruptions. This conceptualization of productivity is echoed by many developers when they talk about “getting into the flow” […].

This concept of flow is really interesting, because it is a way to measure productivity that is not based on outputs, but rather on the experience of the developers themselves. And although “flow” might be subjective and perceptual, studies have linked it to higher productivity and reduced stress; see this open-access book chapter for a readable overview of the research.

To get an idea of Copilot’s impact on flow, we asked the following questions:

Question Choices

Did Copilot decrease your need to switch out of your IDE
(for example to search for answers or check the documentation)?

Significantly

A bit

No

Did Copilot enhance your capacity to stay in your development flow?

Significantly

A bit

No

The results were unambiguous: most users found that Copilot helped significantly, and a strong majority found that it helped at least a little.

Learnings from organizing the experiment

Although the experiment went well overall, we noted a few challenges worth sharing.

First, ensuring active participation in the experiment required a collective effort within the company. Spreading the word internally about the experiment and looking for participants is an effort not to be underestimated. In our case, we benefited from great support from internal leaders and managers who helped communicate with and recruit participants. Even so, we would have liked to have more participants. It turns out that engineers are sometimes just too busy!

Second, keeping participants focused on the experiment was harder than expected. We had asked participants to make a git commit at the end of each task, thinking that we could use this data to quantify the time it took for each participant to complete their tasks. When looking at the data, we were surprised to see that the time between commits varied widely and was often much longer than expected. When asked, several participants reported that they had to interrupt our experiment to deal with higher-priority tasks. In the end, we discarded the timing data: they were too limited and too heavily influenced by external factors to provide useful conclusions. For the same reason, we haven’t even mentioned yet that our study had a control group: since the timing data wasn’t useful, we’ve omitted the control group entirely from the data presented here.

The ideal scenario of securing dedicated, uninterrupted time from a large pool of engineers proved impractical within our organizational context. Nevertheless, despite these limitations, we successfully gathered a meaningful dataset that contributes valuable perspectives to the existing body of research on AI-assisted development.

Further references

Speaking of other work out there, there’s a lot of it! It turns out that many people are excited by the potential of code assistants and want to understand them better. Who knew? Here is some further reading that we found particularly interesting:

Experiments at Microsoft and Accenture introduced Copilot into engineers’ day-to-day workflow and measured the impact on various productivity metrics, like the number of opened pull requests; they found that Copilot usage significantly increased the number of successful builds. They had a much larger sample size than we did — Microsoft and Accenture have a lot of engineers — but unlike us they didn’t specifically consider the uptake of unfamiliar tools and libraries.

A research team from Microsoft and MIT recruited developers from Upwork, gave them a task, and measured the time it took with and without Copilot’s help; they found that Copilot users were about 50% faster. They did a better job than we did at measuring completion time (they used GitHub Classroom), but we think our exit survey asked more interesting questions.

The Pragmatic Engineer ran a survey about how engineers are using AI tooling, covering popular tools and their perceived impact on development.

Conclusion

Our experiment provided valuable insights into the impact of GitHub Copilot on developer experiences at Modus Create. Overall, developers reported increased productivity and a more seamless workflow. Participants used Copilot extensively in specific coding scenarios, such as automated testing and modifying code that used libraries they were unfamiliar with, and they felt more productive in those cases.

It was particularly interesting to see how the interface to the AI assistant (chat vs. completion) affected participants’ opinions on what the assistant was useful for, with chat-heavy users prioritizing exploration over acceleration and completion-heavy users the other way around. As interfaces and tooling continue to evolve — faster than we can design and run experiments to test them — we expect them to play a huge role in the success of AI-powered code assistants.

We made a small mistake with the wording in Pandas and Streamlit questions: we gave them the options “I have never used it”, “I have heard of it”, “I have used it before in a limited way”, “I am comfortable with it”, and “I am an advanced user”. The problem, of course, is that these responses aren’t mutually exclusive. Given the order the responses were presented in, we think it’s reasonable to interpret “I have never used it” responses to mean that they’d heard of it but never used it. For the plot, we’ve combined “I have never used it” and “I have heard of it” into “Never used it”.↩

May 29, 2025 12:00 AM

Question	Choices
If you didn't have Copilot, reaching the answer for task X would have taken...	Less time About the same time More time

Question	Choices
Which of the following have you used to complete task X?	Copilot code completions Copilot Chat Google search Library documentation My knowledge

Question	Choices
How would you describe your Python level?	No Python experience Beginner Intermediate Advanced

Question	Choices
Which of the following do you think is true?	Copilot is better with acceleration: it helps accelerate work that I already know how to do. Copilot is better with exploration: it helps me explore the problem and how to solve it when I am not sure how to proceed.

Question	Choices
Did Copilot decrease your need to switch out of your IDE (for example to search for answers or check the documentation)?	Significantly A bit No
Did Copilot enhance your capacity to stay in your development flow?	Significantly A bit No

May 28, 2025

Chris Smith 2

Threshold Strategy in Approval and Range Voting

How to turn polling insight into an optimal ballotâ€Šâ€”â€Šand why anything else isÂ wasted.
â€œapprove ofâ€�? What does that meanÂ anyway?
I have written previously about how approval and range voting methods are intrinsically tactical. This doesnâ€™t mean that they are more tactical than other election systems (nearly all of which are shown to sometimes be tactical by Gibbardâ€™s Theorem when there are three or more options). Rather, it means that tactical voting is unavoidable. Voting in such a system requires answering the question of where to set your approval threshold or how to map your preferences to a ranged voting scale. These questions donâ€™t have more or less â€œhonestâ€� answers. They are always tacticalÂ choices.
But I havenâ€™t dug deeper into what these tactics look like. Here, Iâ€™ll do the mathematical analysis to show what effective voting looks like in these systems, and make some surprising observations along theÂ way.
Mathematical formalism for approvalÂ voting
Weâ€™ll start by assuming an approval election, so the question is where to put your threshold. At what level of approval do you switch from voting not to approve a candidate to approving them?
Weâ€™ll keep the notationÂ minimal:
As is standard in probability, Iâ€™ll write â„™[X] for the probability of an event X, and ğ�”¼[X] for the expected value of a (numerical) random variableÂ X.
I will use B to refer to a random collection (multiset) of ballots, drawn from some probability distribution reflecting what we know from polling and other information sources on other voters. B will usually not include the approval vote that youâ€™re considering casting, and to include that approval, weâ€™ll write B âˆª {c}, where c is the candidate you contemplate approving.
Iâ€™ll write W(Â·) to indicate the winner of an election with a given set of ballots. This is the candidate with the most approvals. Weâ€™ll assume some tiebreaker is in place thatâ€™s independent of individual voting decisions; for instance, candidates could be shuffled into a random order before votes are cast, in in the event of a tie for number of approvals, weâ€™ll pick the candidate who comes first in that shuffledÂ order.
U(Â·) will be your utility function, so U(c) is the utility (i.e., happiness, satisfaction, or perceived social welfare) that you personally will get from candidate c winning the election. This doesnâ€™t mean you have to be selfish, per se, as accomplishing some altruistic goal is still a form of utility, but we evaluate that utility from your point of view even though other voters may disagree.
With this notation established, we can clearly state, almost tautologically, when you should approve of a candidate c. You should approve of c whenever:
ğ�”¼[U(W(B âˆª {c}))] > ğ�”¼[U(W(B))]
Thatâ€™s just saying you should approve of c if your expected utility from the election with your approval of c is more than your utility withoutÂ it.
The role of pivotal votes and exactÂ strategy
This inequality can be made more useful by isolating the circumstances in which your vote makes a difference in the outcome. That is, W(B âˆª {c}) â‰ W(B). Non-pivotal votes contribute zero to the net expectation, and can beÂ ignored.
In approval voting, approving a candidate can only change the outcome by making that candidate the winner. This means a pivotal vote is equivalent to bothÂ of:
W(B âˆª {c}) =Â c
W(B) â‰ Â c
Itâ€™s useful to have notation for this, so weâ€™ll define V(B, c) to mean that W(B âˆª {c}) â‰ W(B), or equivalently, that W(B âˆª {c}) = c and W(B) â‰ c. To remember this notation, recall that V is the pivotal letter in the word â€œpivotâ€�, and also visually resembles aÂ pivot.
With this in mind, the expected gain in utility from approving cÂ is:
ğ�”¼[U(W(B âˆª {c}))] - ğ�”¼[U(W(B))]. But since the utility gain is zero except for pivotal votes, this is the sameÂ as
â„™[V(B, c)] Â· (ğ�”¼[U(W(B âˆª {c})) | V(B, c)] - ğ�”¼[U(W(B)) | V(B, c)]). But since V(B, c) implies that W(B âˆª {c}) = c, so this simplifies to
â„™[V(B, c)] Â· (U(c) - ğ�”¼[U(W(B)) | V(B,Â c)])
Therefore, you ought to approve of a candidate cÂ whenever
U(c) > ğ�”¼[U(W(B)) | V(B,Â c)]
This is much easier to interpret. You should approve of a candidate c precisely when the utility you obtain from c winning is greater than the expected utility in cases where c is right on the verge of winning (but someone else wins instead).
There are a few observations worth making aboutÂ this:
The expectation clarifies why the threshold setting part of approval voting is intrinsically tactical. It involves evaluating how likely each other candidate is to win, and using that information to compute an expectation. That means advice to vote only based on internal feelings like whether you consider a candidate acceptable is always wrong. An effective vote takes into account external information about how others are likely to vote, including polling and understanding of public opinion andÂ mood.
The conditional expectation, assuming V(B, c), tells us that the optimal strategy for whether to approve of some candidate c depends on the very specific situation where c is right on the verge of winning the election. If c is a frontrunner in the election, this scenario isnâ€™t likely to be too different from the general case, and the conditional probability doesnâ€™t change much. However, if c is a long-shot candidate from some minor party, but somehow nearly ties for a win, weâ€™re in a strange situation indeed: perhaps a major last-minute scandal, a drastic polling error, or a fundamental misunderstanding of the public mood. Here, the conditonal expected utility of an alternate winner might be quite different from your unconditional expectation. If, say, voters prove to have an unexpected appetite for extremism, this can affect the runner-ups, asÂ well.
Counter-intuitively, an optimal strategy might even involve approving some candidates that you like less than some that you donâ€™t approve! This can happen because different candidates are evaluated against different thresholds. Therefore, a single voterâ€™s best approval ballot isnâ€™t necessarily monotonic in their utility rankings. This adds a level of strategic complexity I hadnâ€™t anticipated in my earlier writings on strategy in approvalÂ voting.
Approximate strategy
The strategy described above is rigorously optimal, but not at all easy to apply. Imagining the bizarre scenarios in which each candidate, no matter how minor, might tie for a win, is challenging to do well. Weâ€™re fortunate, then, that thereâ€™s a good approximation. Remember that the utility gain from approving a candidate was equalÂ to
â„™[V(B, c)] Â· (U(c) - ğ�”¼[U(W(B)) | V(B,Â c)])
In precisely the cases where V(B, c) is a bizarre assumption thatâ€™s difficult to imagine, weâ€™re also multiplying by â„™[V(B, c)], which is vanishingly small, so this vote is very unlikely to make a difference in the outcome. For front-runners, who are relatively much more likely to be in a tie for the win, the conditional probability changes a lot less: scenarios that end in a near-tie are not too different from the baseline expectation.
This happens because â„™[V(B, c)] falls off quite quickly indeed as the popularity of c decreases, especially for large numbers of voters. For a national scale election (say, about 10 million voters), if c expects around 45% of approvals, then â„™[V(B, c)] is around one in a million. Thatâ€™s a small number, telling us that very large elections arenâ€™t likely to be decided by a one-vote margin anyway. But itâ€™s gargantuan compared to the number if c expects only 5% of approvals. Then â„™[V(B, c)] is around one in 10^70. Thatâ€™s about one in a quadrillion-vigintillion, if you want to know, and near the scale of possibly picking one atom at random from the entire universe! The probability of casting a pivotal vote drops off exponentially, and by this point itâ€™s effectively zero.
With that in mind, we can drop the condition on the probability in the second term, giving us a new rule: Approve of a candidate c any timeÂ that:
U(c) > ğ�”¼[U(W(B))]
That is, approve of any candidate whose win you would like better than you expect to like the outcome of the election. In other words, imagine you have no other information on election night, and hear that this candidate has won. If this would be good news, approve of the candidate on your ballot. If it would be bad news,Â donâ€™t.
This rule is still tactical. To determine how much you expect to like the outcome of the election, you need to have beliefs about who else is likely to win, which still requires an understanding of polling and public opinion andÂ mood.
However, there is one threshold, derived from real polling data in realistic scenarios, and you can cast your approval ballot monotonically based on that single threshold.
This is no longer a true optimal strategy, but with enough voters, the exponential falloff in â„™[V(B, c)] as c becomes less popular is a pretty good assurance that the incorrect votes you might cast by using this strategy instead of the optimal ones are extremely unlikely to matter. In practice, this is probably the best rule to communicate to voters in an approval election with moderate to large numbers ofÂ voters.
We can get closer with the following hypothetical: Imagine that on election night, you have no information on the results except for a headline that proclaims: Election Too Close To Call. With that as your prior, you ask of each candidate, is it good or bad news to hear now that this candidate has won. If it would be good news, then you approve of them. This still leaves one threshold, but weâ€™re no longer making the leap that the pivotal condition for front-runners is unnecessary; weâ€™re imagining a world in which at least some candidates, almost surely the front-runners, are tied. If this changes your decision (which it likely would only in very marginal cases), you can use this more accurate approximation.
Reducing range to approvalÂ voting
I promised to look at strategy for range voting, as well. Armed with an appreciation of approval strategy, itâ€™s easy to extend this to an optimal range strategy, as well, for large-scale elections.
The key is to recognize that a range voting election with options 0, 1, 2,Â â€¦, n is mathematically equivalent to an approval election where everyone is just allowed to vote n times. The number you mark on the range ballot can be interpreted as saying how many of your approval ballots you want to mark as approving that candidate.
Looking at it this way presents the obvious question: why would you vote differently on some ballots than others? In what situation could that possibly be the rightÂ choice?
For small elections, say if youâ€™re voting on places to go out and eat with your friends or coworkers, itâ€™s possible that adding in a handful of approvals materially changes the election so that the optimal vote is different. Then it may well be optimal to cast a range ballot using some intermediate number.
For large elections, though, youâ€™re presented with pretty much exactly the same question each time, and you may as well give the same answer. Therefore, in large-scale elections, the optimal way to vote with a range ballot is always to rate everyone either the minimum or maximum possible score. This reduces a range election exactly to an approval election. The additional expressiveness of a range ballot is a siren call: by using it, you always vote less effectively than you would have by ignoring it and using only the two extremeÂ choices.
Since weâ€™re discussing political elections, which have relatively large numbers of voters, this answers the question for range elections, as well: Rate a candidate the maximum score if you like them better than you expect to like the outcome of the election. Otherwise, rate them the minimumÂ score.
Summing itÂ up
What weâ€™ve learned, then, is that optimal voting in approval or range systems boils down to two nestedÂ rules.
Exact rule (for the mathematically fearless): approve c iff U(c) > ğ�”¼[ U(W(B)) | your extra vote for c is pivotal ]. This Bayesian test weighs each candidate against the expected utility in the razor-thin worlds where they tie forÂ first.
Large-electorate shortcut (for everyone else): because those pivotal worlds become astronomically rare as the field grows, the condition shrinks to a single cutoff: approve (or give a maximum score) to every candidate whose victory you expect to enjoy more than you expected to like the result. (If you can, imagine only cases where you know the election isÂ close.)
Weâ€™ve seen why the first rule is the gold standard; but the second captures virtually all of its benefit when millions are voting. Either way, strategy is inseparable from sincerity: you must translate beliefs about polling into a utility threshold, and then measure every candidate against it. Weâ€™ve also seen by a clear mathematical equivalence why range ballots add no real leverage in large-scale elections, instead only offering false choices that are alwaysÂ wrong.
The entire playbook fits on a sticky note: compute the threshold, vote all-or-nothing, and let the math do theÂ rest.

by Chris Smith at May 28, 2025 08:25 PM

Philip Wadler

The Provocateurs: Brave New Bullshit

Following two sell-out shows at the Fringe last year, I'm on at the Fringe again:

11.25 Monday 4 August, Stand 2 w/Lucy Remnant and Susan Morrison
17.40 Sunday 17 August, Stand 4 w/Smita Kheria and Sarah-Jane Judge
17.40 Tuesday 19 August, Stand 4 w/Cameron Wyatt and Susan Morrison

Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:

Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.

by Philip Wadler ([email protected]) at May 28, 2025 11:16 AM

May 25, 2025

Mark Jason Dominus

Mystery of the quincunx's missing quincunx

A quincunx is the X-shaped pattern of pips on the #5 face of a die.

It's so-called because the Romans had a common copper coin called an as, and it was divided (monetarily, not physically) into twelve uncia. There was a bronze coin worth five uncia called a quīncunx, which is a contraction of quīnque (“five”) + uncia, and the coin had that pattern of dots on it to indicate its value.

Uncia generally meant a twelfth of something. It was not just a twelfth of an as, but also a twelfth of a pound , which is where we get the word “ounce”, and a twelfth of a foot, which is where we get the word “inch”.

The story I always heard about the connection between the coin and the X-shaped pattern of dots was the one that is told by Wikipedia:

Its value was sometimes represented by a pattern of five dots arranged at the corners and the center of a square, like the pips of a die. So, this pattern also came to be called quincunx.

Or the Big Dictionary:

… [from a] coin of this value (occasionally marked with a pattern resembling the five spots on a dice cube),…

But today I did Google image search for qunicunxes. And while most had five dots, I found not even one that had the dots arranged in an X pattern.

(I believe the heads here are Minerva, goddess of wisdom. The owl is also associated with Minerva.)

Where's the quincunx that actually has a quincuncial arrangement of dots? Nowhere to be found, it seems. But everyone says it, so it must be true.

Addenda

The first common use of “quincunx” as an English word was to refer to trees that were planted in a quincuncial pattern, although not necessarily in groups of exactly five, in which each square of four trees had a fifth at its center.

Similarly, the Galton Box, has a quincuncial arrangement of little pegs. Galton himself called it a “quincunx”.

The OED also offers this fascinating aside:

Latin quincunx occurs earlier in an English context. Compare the following use apparently with reference to a v-shaped figure:

1545 Decusis, tenne hole partes or ten Asses...It is also a fourme in any thynge representyng the letter, X, whiche parted in the middel, maketh an other figure called Quincunx, V.

which shows that for someone, a quincuncial shape was a V and not an X, presumably because V is the Roman numeral for five.

A decussis was a coin worth not ten uncia but ten asses, and it did indeed have an X on the front. A five-as coin was a quincussis and it had a V. I wonder if the author was confused?

The source is Bibliotheca Eliotæ. The OED does not provide a page number.

It wasn't until after I published this that I realized that today's date was the extremely quincuncial 2025-05-25. I thank the gods of chance and fortune for this little gift.

by Mark Dominus ([email protected]) at May 25, 2025 11:00 PM

May 24, 2025

Mark Jason Dominus

The fivefold symmetry of the quince

The quince is so-named because, like other fruits in the apple family, it has a natural fivefold symmetry:

This is because their fruits develop from five-petaled flowers, and the symmetry persists through development. These are pear blossoms:

You can see this in most apples if you cut them into equatorial slices:

The fivefold symmetry isn't usually apparent from the outside once the structure leaves the flowering stage. But perfect Red Delicious specimens do have five little feet:

P.S.: I was just kidding about the name of the quince, which actually has nothing to do with any of this. It is a coincidence.

by Mark Dominus ([email protected]) at May 24, 2025 03:29 AM

May 22, 2025

Simon Marlow

Indexing Hackage: Glean vs. hiedb
Indexing Hackage: Glean vs. hiedb
May 22, 2025
I thought it might be fun to try to use Glean to index as much of Hackage as I could, and then do some rough comparisons against hiedb and also play around to see what interesting queries we could run against a database of all the code in Hackage.

This project was mostly just for fun: Glean is not going to replace hiedb any time soon, for reasons that will become clear. Neither are we ready (yet) to build an HLS plugin that can use Glean, but hopefully this at least demonstrates that such a thing should be possible, and Glean might offer some advantages over hiedb in performance and flexibility.

A bit of background:

Glean is a code-indexing system that we developed at Meta. It’s used internally at Meta for a wide range of use cases, including code browsing, documentation generation and code analysis. You can read about the ways in which Glean is used at Meta in Indexing Code At Scale with Glean.

hiedb is a code-indexing system for Haskell. It takes the .hie files that GHC produces when given the option -fwrite-ide-info and writes the information to a SQLite database in various tables. The idea is that putting the information in a DB allows certain operations that an IDE needs to do, such as go-to-definition, to be fast.

You can think of Glean as a general-purpose system that does the same job as hiedb, but for multiple languages and with a more flexible data model. The open-source version of Glean comes with indexers for ten languages or so, and moreover Glean supports SCIP which has indexers for various languages available from SourceGraph.

Since a hiedb is just a SQLite DB with a few tables, if you want you can query it directly using SQL. However, most users will access the data through either the command-line hiedb tool or through the API, which provide the higher-level operations such as go-to-definition and find-references. Glean has a similar setup: you can make raw queries using Glean’s query language (Angle) using the Glean shell or the command-line tool, while the higher-level operations that know about symbols and references are provided by a separate system called Glass which also has a command-line tool and API. In Glean the raw data is language-specific, while the Glass interface provides a language-agnostic view of the data in a way that’s useful for tools that need to navigate or search code.

An ulterior motive

In part all of this was an excuse to rewrite Glean’s Haskell indexer. We built a Haskell indexer a while ago but it’s pretty limited in what information it stores, only capturing enough information to do go-to-definition and find-references and only for a subset of identifiers. Furthermore the old indexer works by first producing a hiedb and consuming that, which is both unnecessary and limits the information we can collect. By processing the .hie files directly we have access to richer information, and we don’t have the intermediate step of creating the hiedb which can be slow.

The rest of this post

The rest of the post is organised as follows, feel free to jump around:

Performance: a few results comparing hiedb with Glean on an index of all of Hackage

Queries: A couple of examples of queries we can do with a Glean index of Hackage: searching by name, and finding dead code.

Apparatus: more details on how I set everything up and how it all works.

What’s next: some thoughts on what we still need to add to the indexer.

Performance

All of this was perfomed on a build of 2900+ packages from Hackage, for more details see Building all of Hackage below.

Indexing performance

I used this hiedb command:
hiedb index -D /tmp/hiedb . --skip-types
I’m using --skip-types because at the time of writing I haven’t implemented type indexing in Glean’s Haskell indexer, so this should hopefully give a more realistic comparison.

This was the Glean command:
glean --service localhost:1234 \
  index haskell-hie --db stackage/0 \
  --hie-indexer $(cabal list-bin hie-indexer) \
  ~/code/stackage/dist-newstyle/build/x86_64-linux/ghc-9.4.7 \
  --src '$PACKAGE'
Time to index:

hiedb: 1021s

Glean: 470s

I should note that in the case of Glean the only parallelism is between the indexer and the server that is writing to the DB. We didn’t try to index multiple .hie files in parallel, although that would be fairly trivial to do. I suspect hiedb is also single-threaded just going by the CPU load during indexing.

Size of the resulting DB

hiedb: 5.2GB

Glean: 0.8GB

It’s quite possible that hiedb is simply storing more information, but Glean does have a rather efficient storage system based on RocksDB.

Performance of find-references

Let’s look up all the references of Data.Aeson.encode:
hiedb -D /tmp/hiedb name-refs encode Data.Aeson
This is the query using Glass:
cabal run glass-democlient -- --service localhost:12345 \
  references stackage/hs/aeson/Data/Aeson/var/encode
This is the raw query using Glean:
glean --service localhost:1234 --db stackage/0 \
  '{ Refs.file, Refs.uses[..] } where Refs : hs.NameRefs; Refs.target.occ.name = "encode"; Refs.target.mod.name = "Data.Aeson"'
hiedb: 2.3s

glean (via Glass): 0.39s

glean (raw query): 0.03s

(side note: hiedb found 416 references while Glean found 415. I haven’t yet checked where this discrepancy comes from.)

But these results don’t really tell the whole story.

In the case of hiedb, name-refs does a full table scan so it’s going to take time proportional to the number of refs in the DB. Glean meanwhile has indexed the references by name, so it can serve this query very efficiently. The actual query takes a few milliseconds, the main overhead is encoding and decoding the results.

The reason the Glass query takes longer than the raw Glean query is because Glass also fetches additional information about each reference, so it performs a lot more queries.

We can also do the raw hiedb query using the sqlite shell:
sqlite> select count(*) from refs where occ = "v:encode" AND mod = "Data.Aeson";
417
Run Time: real 2.038 user 1.213905 sys 0.823001
Of course hiedb could index the refs table to make this query much faster, but it’s interesting to note that Glean has already done that and it was still quicker to index and produced a smaller DB.

Performance of find-definition

Let’s find the definition of Data.Aeson.encode, first with hiedb:
$ hiedb -D /tmp/hiedb name-def encode Data.Aeson
Data.Aeson:181:1-181:7
Now with Glass:
$ cabal run glass-democlient -- --service localhost:12345 \
  describe stackage/hs/aeson/Data/Aeson/var/encode
[email protected]/src/Data/Aeson.hs:181:1-181:47
(worth noting that hiedb is giving the span of the identifier only, while Glass is giving the span of the whole definition. This is just a different choice; the .hie file contains both.)

And the raw query using Glean:
$ glean --service localhost:1234 query --db stackage/0 --recursive \
  '{ Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = "encode"; N.mod.name = "Data.Aeson"; Loc.name = N' | jq
{
  "id": 18328391,
  "key": {
    "tuplefield0": {
      "id": 9781189,
      "key": "aeson-2.1.2.1/src/Data/Aeson.hs"
    },
    "tuplefield1": {
      "start": 4136,
      "length": 46
    }
  }
}
Times:

hiedb: 0.18s

Glean (via Glass): 0.05s

Glean (raw query): 0.01s

In fact there’s a bit of overhead when using the Glean CLI, we can get a better picture of the real query time using the shell:
stackage> { Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = "encode"; N.mod.name = "Data.Aeson"; Loc.name = N
{
  "id": 18328391,
  "key": {
    "tuplefield0": { "id": 9781189, "key": "aeson-2.1.2.1/src/Data/Aeson.hs" },
    "tuplefield1": { "start": 4136, "length": 46 }
  }
}

1 results, 2 facts, 0.89ms, 696176 bytes, 2435 compiled bytes
The query itself takes less than 1ms.

Again, the issue with hiedb is that its data is not indexed in a way that makes this query efficient: the defs table is indexed by the pair (hieFile,occ) not occ alone. Interestingly, when the module is known it ought to be possible to do a more efficient query with hiedb by first looking up the hieFile and then using that to query defs.

What other queries can we do with Glean?

I’ll look at a couple of examples here, but really the possibilities are endless. We can collect whatever data we like from the .hie file, and design the schema around whatever efficient queries we want to support.

Search by case-insensitive prefix

Let’s search for all identifiers that start with the case-insensitive prefix "withasync":
$ glass-democlient --service localhost:12345 \
  search stackage/withasync -i | wc -l
55
In less than 0.1 seconds we find 55 such identifiers in Hackage. (the output isn’t very readable so I didn’t include it here, but for example this finds results not just in async but in a bunch of packages that wrap async too).

Case-insensitive prefix search is supported by an index that Glean produces when the DB is created. It works in the same way as efficient find-references, more details on that below.

Why only prefix and not suffix or infix? What about fuzzy search? We could certainly provide a suffix search too; infix gets more tricky and it’s not clear that Glean is the best tool to use for infix or fuzzy text search: there are better data representations for that kind of thing. Still, case-insensitive prefix search is a useful thing to have.

Could we support Hoogle using Glean? Absolutely. That said, Hoogle doesn’t seem too slow. Also we need to index types in Glean before it could be used for type search.

Identify dead code

Dead code is, by definition, code that isn’t used anywhere. We have a handy way to find that: any identifier with no references isn’t used. But it’s not quite that simple: we want to ignore references in imports and exports, and from the type signature.

Admittedly finding unreferenced code within Hackage isn’t all that useful, because the libraries in Hackage are consumed by end-user code that we haven’t indexed so we can’t see all the references. But you could index your own project using Glean and use it to find dead code. In fact, I did that for Glean itself and identified one entire module that was dead, amongst a handful of other dead things.

Here’s a query to find dead code:
N where
  N = hs.Name _;
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )
Without going into all the details, here’s roughly how it works:

N = hs.Name _; declares N to be a fact of hs.Name

N.sort.external?; requires N to be external (i.e. exported), as opposed to a local variable

hs.ModuleSource { mod = N.mod, file = F }; finds the file F corresponding to this name’s module

The last part is checking to see that there are no references to this name that are (a) in a different file and (b) are in code, i.e. not import/export references. Restricting to other files isn’t exactly what we want, but it’s enough to exclude references from the type signature. Ideally we would be able to identify those more precisely (that’s on the TODO list).

You can try this on Hackage and it will find a lot of stuff. It might be useful to focus on particular modules to find things that aren’t used anywhere, for example I was interested in which identifiers in Control.Concurrent.Async aren’t used:
N where
  N = hs.Name _;
  N.mod.name = "Control.Concurrent.Async";
  N.mod.unit = "async-2.2.4-inplace";
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )
This finds 21 identifiers, which I can use to decide what to deprecate!

Apparatus

Building all of Hackage

The goal was to build as much of Hackage as possible and then to index it using both hiedb and Glean, and see how they differ.

To avoid problems with dependency resolution, I used a Stackage LTS snapshot of package versions. Using LTS-21.21 and GHC 9.4.7, I was able to build 2922 packages. About 50 failed for some reason or other.

I used this cabal.project file:
packages: */*.cabal
import: https://www.stackage.org/lts-21.21/cabal.config

package *
    ghc-options: -fwrite-ide-info

tests: False
benchmarks: False

allow-newer: *
And did a large cabal get to fetch all the packages in LTS-21.21.

Then
cabal build all --keep-going
After a few retries to install any required RPMs to get the dependency resolution phase to pass, and to delete a few packages that weren’t going to configure successfully, I went away for a few hours to let the build complete.

It’s entirely possible there’s a better way to do this that I don’t know about - please let me know!

Building Glean

The Haskell indexer I’m using is in this pull request which at the time of writing isn’t merged yet. (Since I’ve left Meta I’m just a regular open-source contributor and have to wait for my PRs to be merged just like everyone else!).

Admittedly Glean is not the easiest thing in the world to build, mainly because it has a couple of troublesome dependencies: folly (Meta’s library of highly-optimised C++ utilities) and RocksDB. Glean depends on a very up to date version of these libraries so we can’t use any distro packaged versions.

Full instructions for building Glean are here but roughly it goes like this on Linux:

Install a bunch of dependencies with apt or yum

Build the C++ dependencies with ./install-deps.sh and set some env vars

make

The Makefile is needed because there are some codegen steps that would be awkward to incorporate into the Cabal setup. After the first make you can usually just switch to cabal for rebuilding stuff unless you change something (e.g. a schema) that requires re-running the codegen.

Running Glean

I’ve done everything here with a running Glean server, which was started like this:
cabal run exe:glean-server -- \
  --db-root /tmp/db \
  --port 1234 \
  --schema glean/schema/source
While it’s possible to run Glean queries directly on the DB without a server, running a server is the normal way because it avoids the latency from opening the DB each time, and it keeps an in-memory cache which significantly speeds up repeated queries.

The examples that use Glass were done using a running Glass server, started like this:
cabal run glass-server -- --service localhost:1234 --port 12345
How does it work?

The interesting part of the Haskell indexer is the schema in hs.angle. Every language that Glean indexes needs a schema, which describes the data that the indexer will store in the DB. Unlike an SQL schema, a Glean schema looks more like a set of datatype declarations, and it really does correspond to a set of (code-generated) types that you can work with when programmatically writing data, making queries, or inspecting results. For more about Glean schemas, see the documentation.

Being able to design your own schema means that you can design something that is a close match for the requirements of the language you’re indexing. In our Glean schema for Haskell, we use a Name, OccName, and Module structure that’s similar to the one GHC uses internally and is stored in the .hie files.

The indexer itself just reads the .hie files and produces Glean data using datatypes that are generated from the schema. For example, here’s a fragment of the indexer that produces Module facts, which contain a ModuleName and a UnitName:
mkModule :: Glean.NewFact m => GHC.Module -> m Hs.Module
mkModule mod = do
  modname <- Glean.makeFact @Hs.ModuleName $
    fsToText (GHC.moduleNameFS (GHC.moduleName mod))
  unitname <- Glean.makeFact @Hs.UnitName $
    fsToText (unitFS (GHC.moduleUnit mod))
  Glean.makeFact @Hs.Module $
    Hs.Module_key modname unitname
Also interesting is how we support fast find-references. This is done using a stored derived predicate in the schema:
predicate NameRefs:
  {
    target: Name,
    file: src.File,
    uses: [src.ByteSpan]
  } stored {Name, File, Uses} where
  FileXRefs {file = File, refs = Refs};
  {name = Name, spans = Uses} = Refs[..];
here NameRefs is a predicate—which you can think of as a datatype, or a table in SQL—defined in terms of another predicate, FileXRefs. The facts of the predicate NameRefs (rows of the table) are derived automatically using this definition when the DB is created. If you’re familiar with SQL, a stored derived predicate in Glean is rather like a materialized view in SQL.

What’s next?

As I mentioned earlier, the indexer doesn’t yet index types, so that would be an obvious next step. There are a handful of weird corner cases that aren’t handled correctly, particularly around record selectors, and it would be good to iron those out.

Longer term ideally the Glean data would be rich enough to produce the Haddock docs. In fact Meta’s internal code browser does produce documentation on the fly from Glean data for some languages - Hack and C++ in particular. Doing it for Haskell is a bit tricky because while I believe the .hie file does contain enough information to do this, it’s not easy to reconstruct the full ASTs for declarations. Doing it by running the compiler—perhaps using the Haddock API—would be an option, but that involves a deeper integration with Cabal so it’s somewhat more awkward to go that route.

Could HLS use Glean? Perhaps it would be useful to have a full Hackage index to be able to go-to-definition from library references? As a plugin this might make sense, but there are a lot of things to fix and polish before it’s really practical.

Longer term should we be thinking about replacing hiedb with Glean? Again, we’re some way off from that. The issue of incremental updates is an interesting one - Glean does support incremental indexing but so far it’s been aimed at speeding up whole-repository indexing rather than supporting IDE features.
May 22, 2025 12:00 AM

May 15, 2025

Tweag I/O

Why we blog

I was the chief editor for this blog for the past 8 years or so, and I’ve just recently passed the mantle to Chris Harrison. I thought I’d take the opportunity to write a little bit about this blog, how it’s operated and what it means to us. Besides, we do like when things get meta here, so this is a blog post about the blog.

A little bit of history

One of the tenets under which Mathieu Boespflug founded Tweag was that software engineers naturally don’t write enough. Writing is an essential part of the engineering job. We write issues, pull requests, code comments, documentation of various sorts. We discuss and debate online, we have to arbitrate trade-offs. Most of this is common, in fact, to every engineering profession. And although we typically become engineers because of our taste for the technical part of our job (I certainly did, I don’t know for sure how ubiquitous it is); writing is still a big part of our job.

So Mathieu reasoned, if writing is to be such a big part of our job, but we’re technical people at heart, not writers, for the company to be at its best it needs to make it clear that precise and clear writing is important to our job. To that effect, there ought to be venues for us to write, where it was made an expectation, an actual requirement, for us to write. The blog is one of them.

At first, the blog was just there. After all, our blog was, and still is, just (part of) a Github repository. So we’d make pull requests and merge blog posts. But, of course, people seek review. When you’re posting in the name of a company, you tend to be a little more careful about what you write; besides, we’re all so used to having our PRs reviewed. This was all very ad hoc, there was no process for it.

Whatever my reason at the time (I honestly don’t remember after all this time), I ended up participating in the review of most blog posts. It’s a poorly kept, but surprisingly little discussed secret that the most common way to get a responsibility in a company is to just assume said responsibility. Do the thing, and it will become your charge. Anyway, I became editor in chief.

Why we blog

Besides being a venue to exercise our technical writing, this blog has been tremendously useful to us. Keeping a high-quality, serious, technical blog helped establish trust in our work, build bridges with community, and attract clients and employees.

It’s very rewarding for us, as individuals, to be recognised by our peers, and to work with a company which is likewise recognised. This is certainly a great motivation for us to write.

Besides, it fits very well with our open-source values, both as a company and as individuals. We strongly believe in open-source, and the value it has to the world. So we participate. And open-source software isn’t just about putting software’s sources out there. Open-source is also about sharing knowledge. Which is another important element of the company’s ethos. We do have formal and informal venues for sharing knowledge internally, all very important stuff, but it’s a topic for another time. The blog is one of our main venues for sharing knowledge with the rest of the world.

So we do share, we talk about our open-source journey, the software we build, or the software we use. But we also share what we learn through working with clients, not necessarily on open-source projects.

So this blog is something that operates at the sweet spot of any company’s activities: it’s something we do because we believe that it’s the right thing to do, and it’s also something which is valuable to the company. In fact the blog brought so many clients that it allowed Tweag to operate without a marketing team for quite a few years. Nowadays, the company’s bigger and we can’t rely on a blog alone, but it’s still a great way to connect.

How we blog

Writing a blog post is making a pull request against the Github repository for the tweag.io website. A post is a Markdown file, possibly accompanied by resources (such as images). It’s a workflow which is very familiar to software engineers. All the review process is, in fact, a Github pull request review.

The author of a blog post is first tasked with obtaining a “technical review” from their peers: people close to the topic review the blog post for accuracy and relevance. The blog post editors are all engineers (the entire process, in fact, is owned by engineers), but aren’t necessarily very familiar with the blog post’s topic, so this first round of review couldn’t be performed by the editing team, it’s also much better for scalability as technical review uses a little time from a lot of people, rather than a lot of time from a few.

When the author and technical reviewers are satisfied with the blog post, they send it to the editing team, which will do two rounds of review (which we cleverly call “first editorial review” and “second editorial review”). The editing team reviews for clarity and writing quality. Something that isn’t a concern of reviewers (or really of the process as a whole), on the other hand, is SEO. Certainly our blog posts make our website easier to find, and of course this is important to us. But this is the result, we hope, of making blog posts that you like.

We can propose edits in three different ways: we can ask question in the review thread, propose “suggestions” in the Github interface which the author is free to accept or not, or we can push directly to the blog post (in which case we try our best to preserve the author’s voice, and always leave time for the author to check that the edits are correct). The choice between the three is primarily driven by our degree of confidence in the edit. Really we do what makes sense to reduce the amount of back-and-forth.

Reviewing blog posts on Github with a team distributed all around the world is essentially a distributed process. As with any distributed process, we try to minimise the amount of synchronisation. Our target is to be able to do a review in two weeks or less (one week for each round). Sometimes it takes more though, when the blog post presents more difficult editing challenges, or when the author doesn’t have enough availability outside of their client work to respond to our reviews quickly enough. But we hit the target more often than not.

The two rounds of review are functionally identical. We just found that, in practice, having two rounds of review helps raise the quality a lot. The first reviewer often gets involved deeply in the writing of the blog post, to the point that they stop seeing the blog post as a reader. The second reviewer receives a blog post in almost publishable condition and brings fresh eyes to conduct the finishing touch.

What blogging means to us

When Tweag was a standalone company, blogging was almost our entire marketing strategy. Now that we’re part of the substantially larger Modus Create, we aren’t in a position to rely solely on technical blogging for marketing. Yet, this blog is still an important part of our strategy.

All of what I wrote above still applies, but I’d like to bring up something else. See, a company is a little bit faceless. It doesn’t have a drive, it doesn’t have a personality: people do. This is truer the larger a company is. We don’t aspire to be a faceless abstract entity, though. Modus Create is made of people, people that we believe to be interesting, and that we hope you’ll find interesting too. In fact our business is largely for clients (maybe you!) to find our people interesting and hire us. This blog is one of the ways we use to promote individuals. Who they are, what they like, what they know, what they can do. This is why our writing recommendations encourage authors to use the pronoun “I” where many of us would be tempted to say “we”.

A more collectivist consideration is communities. We take part in a number of communities (Haskell, Nix, Bazel, Typescript, …), and it’s all too easy for a company to say it’s part of a community but really meaning that they just use Haskell, Nix, Bazel, Typescript. But it’s not really what being part of a community means, does it? It also entails taking part in community life. There’s a bunch of things we do: we write libraries, contribute upstream, help with governance, sponsor and sometimes organise events, …. And we blog, which also plays its part in community life. This is why we don’t shy away from highly specialised blog posts. Sure they are addressed to a pretty narrow, sometimes quite niche audience, but they are relevant to one of our communities. Of course it helps establish our technical chops. Hopefully it also builds trust.

I should note that what I’m writing about is what works for us. It’s not a recipe that will automatically work for you. There are many excellent technical blogs out there, which, I assume, probably have different approaches to us, such as CockroachDB’s, Netflix’s, fly.io’s and Trail of Bits’s (the latter two I wasn’t aware of, by the way, they were brought to my attention by this blog post’s technical review). If you like this blog, you’ll probably like theirs, go check them out! Dan Luu shares some more general considerations. But at the end of the day, what makes a strong communication strategy is to build around and empower the people you already have. Capitalise on your strengths, don’t go against the grain just to imitate what someone else does, however successful they appear to be.

Sign-off

A paradox of the work of software engineers (and probably all engineering disciplines really) is that writing is such an integral part of our job, but most of us receive virtually no training in our studies.

I’ve never had technical writing classes, myself. The way I learned writing was in part by gleaning some of the popular wisdom taught among my peers, but mostly by co-authoring scientific articles with better technical writers than me and seeing them do their magic. Honestly, I found no better learning experience than seeing an entire paragraph of mine rewritten in a mere few words that were also more precise.

I don’t know how to teach technical writing formally. So a lot of the process we’ve converged on reflects my attempt to replicate what worked on me for the company: teaching technical writing by example. By showing how a blog post can be improved, hopefully the authors will learn to write better next time. Because the blog is this as well: a tool to teach ourselves, collectively, how to be technical writers.

This is a very soft target, it’s hard to measure the degree to which we’ve succeeded. But, because I don’t really know how to close this post without getting a touch emotional (it is, after all, no small moment for me), I’ve witnessed many of our engineers mature as writers, and I can’t help but feeling some parent-like pride at their growth.

May 15, 2025 12:00 AM

May 08, 2025

Mark Jason Dominus

A descriptive theory of seasons in the Mid-Atlantic

[ I started thinking about this about twenty years ago, and then writing it down in 2019, but it seems to be obsolete. I am publishing it anyway. ]

The canonical division of the year into seasons in the northern temperate zone goes something like this:

Spring: March 21 – June 21

Summer: June 21 – September 21

Autumn: September 21 – December 21

Winter: December 21 – March 21

Living in the mid-Atlantic region of the northeast U.S., I have never been happy with this. It is just not a good description of the climate.

I begin by observing that the year is not equally partitioned between the four seasons. The summer and winter are longer, and spring and autumn are brief and happy interludes in between.

I have no problem with spring beginning in the middle of March. I think that is just right. March famously comes in like a lion and goes out like a lamb. The beginning of March is crappy, like February, and frequently has snowstorms and freezes. By the end of March, spring is usually skipping along, with singing birds and not just the early flowers (snowdrops, crocuses, daffodil) but many of the later ones also.

By the middle of May the spring flowers are over and the weather is getting warm, often uncomfortably so. Summer continues through the beginning of September, which is still good for swimming and lightweight clothes. In late September it finally gives way to autumn.

Autumn is jacket weather but not overcoat weather. Its last gasp is in the middle of November. By this time all the leaves have changed, and the ones that are going to fall off the trees have done so. The cool autumn mist has become a chilly winter mist. The cold winter rains begin at the end of November.

So my first cut would look something like this:

Months

Seasons

January

February

March

April

May

June

July

August

September

October

November

December

Winter

Spring

Summer

Autumn

Winter

Note that this puts Thanksgiving where it belongs at the boundary between autumn (harvest season) and winter (did we harvest enough to survive?). Also, it puts the winter solstice (December 21) about one quarter of the way through the winter. This is correct. By the solstice the days have gotten short, and after that the cold starts to kick in. (“As the days begin to lengthen, the cold begins to strengthen”.) The conventional division takes the solstice as the beginning of winter, which I just find perplexing. December 1 is not the very coldest part of winter, but it certainly isn't autumn.

There is something to be said for it though. I think I can distinguish several subseasons — ten in fact:

Dominus Seasonal Calendar

Months

Seasons

Sub-seasons

January

February

March

April

May

June

July

August

September

October

November

December

Winter

Spring

Summer

Autumn

Winter

Midwinter

Late Winter

Early spring

Late spring

Early Summer

Midsummer

Late Summer

Early autumn

Late autumn

Early winter

Midwinter

Midwinter, beginning around the solstice, is when the really crappy weather arrives, day after day of bitter cold. In contrast, early and late winter are typically much milder. By late February the snow is usually starting to melt. (March, of course, is always unpredictable, and usually has one nasty practical joke hiding up its sleeve. Often, March is pleasant and springy in the second week, and then mocks you by turning back into January for the third week. This takes people by surprise almost every year and I wonder why they never seem to catch on.)

Similarly, the really hot weather is mostly confined to midsummer. Early and late summer may be warm but you do not get blazing sun and you have to fry your eggs indoors, not on the pavement.

Why the seasons seem to turn in the middle of each month, and not at the beginning, I can't say. Someone messed up, but who? Probably the Romans. I hear that the Persians and the Baha’i start their year on the vernal equinox. Smart!

Weather in other places is very different, even in the temperate zones. For example, in southern California they don't have any of the traditional seasons. They have a period of cooler damp weather in the winter months, and then instead of summer they have a period of gloomy haze from June through August.

However

I may have waited too long to publish this article, as climate change seems to have rendered it obsolete. In recent years, we have barely had midwinter, and instead of the usual two to three annual snows we have zero. Midsummer has grown from two to four months, and summer now lasts into October.

by Mark Dominus ([email protected]) at May 08, 2025 10:39 PM

May 05, 2025

Matthew Sackman

Payslips and tax: calculating your own
In the UK, it’s very common that your employer pays you once a month. When this happens, they give you a document called a payslip, that has some numbers on it, such as how much your salary is, how much they paid you this month, how much went to HMRC in tax, how much went to your pension, and a few other numbers. But they never show any workings, so you really have no way to check whether any of these numbers are correct. There are plenty of online take-home-pay calculators, but these all focus on the full year; they have no facility to calculate your next payslip.

About half way through April 2024, I stopped working for one company. Everything was wrapped up – I received my final payslip from them, along with my P45. I then had a few months off, and started a new job in July 2024. When you start a new job it always takes a while for money things to get sorted out, for example pension enrolment and sorting out pension contributions, so it’s really worthwhile to keep a close eye on your payslips particularly for these first few months. Mine were arriving and some numbers looked right, but other numbers, such as the amount of tax I was paying, were changing dramatically, month to month. I had no idea why; whether they should be changing like that; whether they were going to keep changing or would eventually settle down. I had no way to check any of these numbers. Was I going to get in trouble with HMRC and get investigated?

I was also a little on edge because this was the first job where my pension contributions were using a thing called Qualifying Earnings. In all my previous jobs, if I chose for 10% of my salary to go into my pension, then that’s what would happen. But now there was this thing called Qualifying Earnings, which is (numbers correct at time of writing) a band from £6240 to £50,270. If you’re earning, say £30k, then your x% contribution is actually x% of £30,000-£6240. If you’re earning above £50,270, then any further increase to your salary will not result in any extra contributions to your pension because you’re above the band. The 2008 Pensions Act, which created the legal requirement for all employees to have workplace pensions and for automatic enrolment (with a minimum 8% combined contribution from the employer and employee), also created this concept of Qualifying Earnings. I consider this is a pretty scummy way of reducing employer pension contributions for large firms. It complicates the maths and no doubt adds confusion for people trying to check their own payslips. Given that 74% of the population have pensions that are too small to retire on, this whole concept of Qualifying Earnings seems amoral at best.

These days, a lot of smaller companies outsource their payroll processing. In my case, I was officially working for an international Employer of Record and they were then outsourcing payroll processing to local firms with country-specific expertise. So when I started asking questions, there was no ability to go and sit with someone and work through it. Or have a call. It was all messages passed across multiple different systems, and partial answers at best would come back several days later. Even if your payroll is done in-house, I strongly suspect that a lot of the time, some software package will be being used that does all the calculations and quite likely no one will actually understand or be able to explain the maths that’s going on.

After a while of getting no-where, and after uncovering some substantial mistakes that had been made that affected me, I decided to spend some weekends actually figuring out how PAYE works, and writing some code that can calculate my next payslip. This library is available for anyone to use. There’s a README that hopefully explains the basic principles of how the calculations are done. This only works if your tax-code ends in an L, and it only works if you’re in National Insurance category A. All the code can do is use some details you provide to predict your next payslips. Also, I’m not a trained accountant or financial adviser, and even for my own payslips, every month, the numbers don’t quite match up (but they’re within £1). So please treat this as a toy, rather than the basis for building a payroll processor!

Getting started

The library is written in Go so you’ll need Go installed. Then, in a terminal do:
$ mkdir payslips
$ cd payslips
$ go mod init mypayslips
$ go get wellquite.org/tax@latest
Now we need to write a tiny amount of code. In your new payslips directory, create a main.go file, and open it in your editor. You want something like this:
package main

import (
   "fmt"
   "wellquite.org/tax"
)

func main() {
   payslips := tax.Payslips{
      {
         Year:                            2024,
         TaxCode:                         "1257L",
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {},
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         Expenses:                        116.08,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
   }

   payslips.Complete()
   fmt.Println(payslips)
}
We create a list of Payslips. The first payslip must specify a year, and your tax-code. These details are automatically applied to the payslips that follow, if not explicitly provided. Many of the calculations rely on year-to-date totals, and so we must have a complete record of your payslips from the start of the tax year. So that means the first payslip is month 1 (in this example, April 2024), then month 2 (May 2024) and so on. If you have no income for a month then you can just put in an empty payslip ({}). The above example describes being paid in April and May 2024, then nothing in June, and then being paid (with a higher salary) in July, August and September.

Save this main.go file. Then, back in your terminal, in your payslips directory, just do:
go run main.go
You should get some output showing all sorts of calculations, including income tax, and personal allowance. With a little luck, if you change the numbers to match your own salary and other details, the numbers produced should match quite closely your own payslips, provided nothing you’re doing is too exotic.

There is documentation for all the different fields that you can provide in each payslip. In general, the code will try to fill in missing values. It should be able to cope with things like salary-sacrifice, or, if you change job within a month and have several payslips for the same month, this should work too. Everything is run locally on your computer: please feel free to check the source – there are no 3rd party libraries at all, and nothing imports the net package. It’ll work just the same if you yank out your network cable or disable your WiFi.

Note however, this code is lightly tested. Whilst it works for me (and one or two friends), I make no claims that it correctly models the entirety of PAYE, so it may very well not work for you. Feedback, contributions, corrections, and patches are all very welcome!
May 05, 2025 02:30 PM

Sandy Maguire

Analyzing API Design via Algebraic Laws
The other day, someone asked:

Why doesn’t [the Data.Map function] unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k a allow for different value types the way intersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k c does?

This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.

To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:
unionWith :: (Maybe a -> Maybe b -> c) -> Map k a -> Map k b -> Map k c
But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”

So instead we could restrict that possibility by using These a b:
data These a b = This a | That b | These a b

unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k c
which seems reasonable enough.

But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?

A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:
lookup k (singleton k v1 <> singleton k v2) = Just v2
This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:
lookup k (m1 <> m2) = lookup k m1 <> lookup k m2
where the monoid in question has been specialized to be Last.

Of course, we also have a monoid homomorphism:
lookup k mempty = mempty
Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:
unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c
which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!

We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:
lookup k (unionWith f m n) = f (lookup k m) (lookup k n)
By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is not mempty, we can see the beginnings of a problem:
  lookup k (unionWith f m n)
=
  f (lookup k m) (lookup k n)
= <let f = \_ _ -> nontrivial>
  nontrivial
Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:

explicitly associate every key in the map with nontrivial, or

keep nontrivial around as a default value in the map

#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:
data Map k v = Map
  { defaultValue :: v
  , implementation :: Data.Map.Map k v
  }
  deriving stock Generic
  deriving (Semigroup, Monoid) via (Generically (Map k v))

unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c
unionWith f (Map def1 imp1) (Map def2 imp2) =
  Map (f def1 def2) (liftA2 f imp1 imp2)
Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:
lookup k (m1 <> m2) = lookup k m1 <> lookup k m2
Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:
lookup k (m1 <> pure nontrivial) = lookup k m1 <> nontrivial
Making this thing efficient is a further complication! We again have two options:

modify the value at every key by multiplying in nontrivial, or

finding a way of suspending this computation

#1 clearly requires $O(n)$ work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:
((pure prefix1 <> ((pure prefix2 <> m) <> n)) <> (p <> pure suffix)
Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.

I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.

But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.

What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.
May 05, 2025 07:33 AM

May 04, 2025

Sandy Maguire

Using Obscure Graph Theory to solve PL Problems
Usually I write about solutions to problems I’ve worked out, but I’ve found myself increasingly becoming interesting in where solutions come from. Maybe it’s because I’ve been reading Boorstin’s excellent The Discoverers, which I’d strongly recommend.

Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.

The Problem

The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:
      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive
which represents the program:
f a (g expensive expensive) + g expensive expensive
Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:
let $0 = expensive
    $1 = g $0 $0
 in f a $1 + $1
This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.

So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.

In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:
_ : IntMap (ExprF Int)
_ = IM.fromList
  [ (0, Apply "+" [1, 3])
  , (1, Apply "f" [2, 3]
  , (2, ...)  -- a
  , (3, Apply "g" [4, 4])
  , (4, ...)  -- expensive
  ]
The Original Solution

I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.

Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.

Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.

At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.

Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.

Unfolding a Solution

I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.

Recall our original graph:
      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive
If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:
      +
    /  \
  f     |
  | \   |
  a  \ /
      g
     / \
     \ /
   expensive
The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.

This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:
+         -> {+, f, a, g, expensive}
f         -> {f, a, g, expensive}
a         -> {a}
g         -> {g, expensive}
expensive -> {expensive}
Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:
{f, a, g, expensive} union {g, expensive}
giving us
{g, expensive}
which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.

EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:
      +
    /  \
  f     |
  | \   |
  a  \ /
      λx
     / \
     \ /
   expensive
      |
      x
This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.

Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.

Performance Woes

My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.

Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.

Damn!

Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.

Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!

Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.

And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.

Finding the Kernel

Back to the drawing board.

Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.

We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.

Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.

Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.

The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”

But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”

The distinction between the two is easily seen in the following graph:
  0
 / \
1   2
| X |
3   4
Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.

The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!

The First Implementer

Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.

The lca package implements an $O(log n)$ algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.

Time to roll up the sleeves and get cracking I suppose.

The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.

But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:
lsaTree :: Ord v => Map v (Set v) -> Map v (Path v)
lsaTree input = fix $ \result -> M.fromList $ do
  (node, parents) <- M.toList input
  let parentResults = fmap (result M.!) parents
  ...
Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?

Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.

Tying It All Back Together

Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.

So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.

It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.

More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.

Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.
May 04, 2025 08:05 AM

May 03, 2025

Mark Jason Dominus

Claude and I write a utility program
Then I had two problems…

A few days ago I got angry at xargs for the hundredth time, because for me xargs is one of those "then he had two problems" technologies. It never does what I want by default and I can never remember how to use it. This time what I wanted wasn't complicated: I had a bunch of PDF documents in /tmp and I wanted to use GPG to encrypt some of them, something like this:
    gpg -ac $(ls *.pdf | menupick)
menupick is a lovely little utility that reads lines from standard input, presents a menu, prompts on the terminal for a selection from the items, and then prints the selection to standard output. Anyway, this didn't work because some of the filenames I wanted had spaces in them, and the shell sucks. Also because gpg probably only does one file at a time.

I could have done it this way:
    ls *.pdf | menupick | while read f; do gpg -ac "$f"; done
but that's a lot to type. I thought “aha, I'll use xargs.” Then I had two problems.
    ls *.pdf | menupick | xargs gpg -ac
This doesn't work because xargs wants to batch up the inputs to run as few instances of gpg as possible, and gpg only does one file at a time. I glanced at the xargs manual looking for the "one at a time please" option (which should have been the default) but I didn't see it amongst the forest of other options.

I think now that I needed -n 1 but I didn't find it immediately, and I was tired of looking it up every time when it was what I wanted every time. After many years of not remembering how to get xargs to do what I wanted, I decided the time had come to write a stripped-down replacement that just did what I wanted and nothing else.

(In hindsight I should perhaps have looked to see if gpg's --multifile option did what I wanted, but it's okay that I didn't, this solution is more general and I will use it over and over in coming years.)

xar is a worse version of xargs, but worse is better (for me)

First I wrote a comment that specified the scope of the project:
    # Version of xargs that will be easier to use
    #
    # 1. Replace each % with the filename, if there are any
    # 2. Otherwise put the filename at the end of the line
    # 3. Run one command per argument unless there is (some flag)
    # 4. On error, continue anyway
    # 5. Need -0 flag to allow NUL-termination
There! It will do one thing well, as Brian and Rob commanded us in the Beginning Times.

I wrote a draft implementation that did not even do all those things, just items 2 and 4, then I fleshed it out with item 1. I decided that I would postpone 3 and 5 until I needed them. (5 at least isn't a YAGNI, because I know I have needed it in the past.)

The result was this:
    import subprocess
    import sys

    def command_has_percent(command):
        for word in command:
            if "%" in word:
                return True
        return False

    def substitute_percents(target, replacement):
        return [ s.replace("%", replacement) for s in target ]

    def run_command_with_filename(command_template, filename):
        command = command_template.copy()
        if not command_has_percent(command):
            command.append("%")

        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

    if __name__ == '__main__':
        template = sys.argv[1:]
        ok = True
        for line in sys.stdin:
            if line.endswith("\n"):
                line = line[:-1]
            if not run_command_with_filename(template, line):
                ok = False
        exit(0 if ok else 1)
Short, clean, simple, easy to use. I called it xar, ran
    ls *.pdf | menupick | xar gpg -ac
and was content.

Now again, with Claude

The following day I thought this would be the perfect opportunity to try getting some LLM help with programming. I already had a baseline version of xar working, and had thought through the problem specification. Now I could do it over with the LLM and compare the two results. The program I wanted was small and self-contained. If the LLM sabotaged me somehow, I would probably notice, and if I didn't, it wouldn't matter, because I had a working version already.

So I asked Claude about it. I knew Claude was not the right choice, because Claude is not really designed for this task, and has no integration with the coding editor, but Claude is what I am familiar with so I thought I'd give it a shot.

On the whole it went extremely well. The complete transcript is here. I imagine this was pretty much a maximally good experience, that all the terrible problems of LLM coding arise in larger, more complicated systems. But this program is small and self-contained, with no subtleties, so the LLM could be at its best.

Phase 1

First I just told Claude what I wanted and let it write the program from scratch.

Let's write a program in Python called xar. It will be a stripped-down version of the standard Unix xargs utility, but it will do less and be simpler to use.

It will get command-line arguments that specify a command template. Then it will read lines from stdin. Each line will be appended to the command template to produce a command. The program will run the resulting command, one command for each line of input.

Any questions?

Default Claude is very chatty and I am going to omit most of its chatty comments, which make me impatient.

The program it wrote it was not what I would have written, but it was good enough. If I had just used it right off the bat, instead of writing my own, it would have been enough, and it would have taken somewhere between 2% and 10% as long to produce.

So the one-line summary of this article is: I should stop writing simple command-line utilities, and just have Claude write them instead, because Claude's are good enough, and definitely better than yak-shaving.

Here's Claude's initial version:
    #!/usr/bin/env python3

    import sys
    import subprocess
    import shlex

    def main():
        if len(sys.argv) < 2:
            print("Usage: xar command [args...]", file=sys.stderr)
            sys.exit(1)

        command_template = sys.argv[1:]

        for line in sys.stdin:
            line = line.rstrip('\n')

            full_command = command_template + [line]

            try:
                result = subprocess.run(full_command, check=True)
            except subprocess.CalledProcessError as e:
                print(f"Command failed with exit code {e.returncode}: {' '.join(full_command)}", 
                      file=sys.stderr)
            except FileNotFoundError:
                print(f"Command not found: {full_command[0]}", file=sys.stderr)
                sys.exit(1)

    if __name__ == "__main__":
        main()
Claude's version had numerous comments, which I have omitted. I later told it to stop putting in comments, which it did.

Claude's use of check here was not what I wanted, because that makes subprocess.run raise an exception when the subcommand fails, and then the exception has to be immediately caught and handled. My original control flow had been simpler:
        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

        …

        if not run_command_with_filename(template, line):
            ok = False
Claude's program pulled in shlex without noticing that it was unused. But Claude did teach me about str.rstrip('\n') which I had not known about before (or maybe had forgotten), so that was one small win already.

Argument parsing

The next step was a big win. Python's library for command-line argument handling is called argparse and it is really nice. If I were still writing programs in Perl, I would implement a Perl version of argparse because Perl has 29 available argument parsing libraries and they are all loathsome. The one problem with argparse is I never remember off the top of my head how to use it. I think the module is called argparse but it provides a class called Argparser but I often get these backward and try to use argparser and Argparse.

Instead of figuring it out every time I usually dig up some previous Python program and then copy-paste the argument parser from there, amending it to suit the purpose.

But this time I didn't have to do that. Instead, I just said to Claude:

This is good, now please add code at the top to handle argument parsing with the standard Argparse library, even though there are no options yet.

Claude handed me back pretty much the same program, but with the argument parser at the top:
    def parse_arguments():
        """Parse command-line arguments using argparse."""
        parser = argparse.ArgumentParser(
            description="A simplified version of xargs that appends each line from stdin to a command template.",
            prog="xar"
        )

        # Store all positional arguments as the command to execute
        parser.add_argument(
            "command", 
            nargs="+", 
            help="The command template to execute for each input line"
        )

        return parser.parse_args()
Let's pause for a moment. Maybe you kids are unimpressed by this. But if someone had sent this interaction back in time and showed it to me even as late as five years ago, I would have been stunned. It would have been completely science-fictional. The code that it produced is not so impressive, maybe, but that I was able to get it in response to an unstructured request in plain English is like seeing a nuclear-powered jetpack in action.

I make some changes

It bothered me that the command-line arguments were being taken implicitly from sys.argv; that makes the program hard to test. A better way to do this is to have the list of command-line arguments be a function to the main function, and then for the runtime to call main(sys.argv[1:]) instead of just main(). That way you can test the program without having to fake up sys.argv. So I changed that, and while I was making changes I got rid of Claude's comments and the unnecessary exception handling. I pasted the result back at Claude, saying:

Good so far. I have adjusted your source code in several ways:

I removed the comments.

I removed the exception handling for the subcommand. If one of the subcommands fails, that no longer terminates the main loop

Command-line arguments are passed to the main() function rather than being accessed directly

Here is the code. Tell me if you see any problems or if you have any questions about it

I had also sabotaged the code with at least one likely error. Python's subprocess.run returns a “result” structure that includes the subcommand's output and exit status, and I can never remember the name of the component that records the exit status. I either look it up or get it by trial-and-error. This time I didn't bother to either, I just guessed that it was result.exitcode, which I handed off to Claude even though I thought was probably wrong.

Claude's reply said:

Here are a couple of issues I noticed:

There's an error in the error handling code - you're using result.exitcode but in the subprocess.run() function, the attribute is called returncode not exitcode

Awesome, now I don't have to remember that name of that component.

Claude continued:

The parse_arguments() function definition is missing in the pasted code, but it would need to be updated to accept the argv parameter

I had passed sys.argv[1:] to main, and then had main call parse_arguments(argv). But I hadn't changed parse_arguments to expect an argument list. (I hadn't included it in my paste at all, in fact.) Claude was exactly right, and continued with the corrected version:
    #!/usr/bin/env python3

    import sys
    import subprocess
    import argparse

    def parse_arguments(argv):
        …
        return parser.parse_args(argv)

    def main(argv):
        # Parse command-line arguments
        args = parse_arguments(argv)
        …

    if __name__ == "__main__":
        ok = main(sys.argv[1:])
        exit(0 if ok else 1)
Handling NUL-separated values

I said

It's great, thanks for catching those errors. Now let's add a -0 command-line option that tells the program to split the input on "\0" characters instead of at line breaks.

Recall that my handwritten xar did not do this yet; I had left that feature for another day. Claude just added it for me, adding
    parser.add_argument(
        "-0", "--null",
        action="http://planet.haskell.org/store_true",
        help="Input items are terminated by a null character instead of by newline"
    )
to the argument parser and changing the main loop to:
    # Split input based on the delimiter option
    if args.null:
        # Split on null characters
        items = stdin_data.split('\0')
    else:
        # Split on newlines
        items = stdin_data.splitlines()

    for item in items:
        if not item:
            continue

        …
I was tired of Claude's comments, so I said

Give me the same thing, with no comments

which it did, so I said

From now on leave out the comments. I'm a Python expert and I don't need them. If there is something I don't understand I will ask you.

Claude complied. If I were going to do use Claude again in the future I would include that in the canned instructions that Claude is given up front. Instead I will probably use a tool better-suited to programming, and do whatever the analogous thing is.

Template filling

Now I told Claude to add the % feature:

I changed if not item to if item == "". Now let's make the following change:

If the command template includes any % characters, each of these should be replaced with the input item.

Otherwise, if there were no % characters, the input item should be appended to the end of the command as in the current version

Claude did this. It used an explicit loop instead of the list comprehension that I had used (and preferred), but it did do it correctly:
    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
            full_command.append(arg.replace('%', item))
        else:
            full_command.append(arg)

    if not has_placeholder:
        full_command.append(item)
Even without the list comprehension, I would have factored out the common code:
    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
        full_command.append(arg.replace('%', item))

    if not has_placeholder:
        full_command.append(item)
But I am not going to complain, my code is simpler but is doing unnecessary work.

Claude also took my hint to change item == "" even though I didn't explicitly tell it to change that.

At this point the main loop of the main function was 15 lines long, because Claude had stuck all the %-processing inline. So I said:

Good, let's extract the command template processing into a subroutine.

It did this right, understanding correctly what code I was referring to and extracting it into a subroutine called process_command_template. More science fiction: I can say "command template processing" and it guesses what I had in mind!

This cut the main loop to 7 lines. That worked so well I tried it again:

Good, now let's extract the part of main that processes stdin into a subroutine that returns the items array

It pulled the correct code into a function called process_stdin. It did not make the novice mistake of passing the entire args structure to this function. In the caller it had process_stdin(args.null) and inside of process_stdin this parameter was named use_null_delimiter,

YAGNI?

At this point I was satisfied but I thought I might as well ask if it should do something else before we concluded:

Can you think of any features I left out that would be useful enough to warrant inclusion? Remember this program is supposed to be small and easy to use, in contrast to the existing xargs which is very complicated.

Claude had four suggestions:

A -p or --parallel option to run commands in parallel

A -n or --max-args option to specify the maximum number of items to pass per command

A simple -v or --verbose flag to show commands as they're executed

A way to replace the command's standard input with the item instead of adding it as an argument

All reasonable suggestions, nothing stupid. (It also supplied code for #3, which I had not asked for and did not want, but as I said before, default Claude is very chatty.)

Parallelization

I didn't want any of these, and I knew that #2–4 would be easy to add if I did want any of them later. But #1 was harder. I've done code like this in the past, where the program has a worker pool and runs a new process whenever the worker pool isn't at capacity. It's not even that hard. In Perl you can play a cute trick and use something like
    $workers{spawn()} = 1 while delete $workers{wait()};
where the workers hash maps process IDs to dummy values. A child exits, wait() awakens and returns the process ID of the completed child, which is then deleted from the map, and the loop starts another worker.

I wanted to see how Claude would do it, and the result was an even bigger win than I had had previously, because Claude wrote this:
   with concurrent.futures.ProcessPoolExecutor(max_workers=args.parallel) as executor:
        futures = [executor.submit(execute_command, cmd, args.verbose) for cmd in commands]

        for future in concurrent.futures.as_completed(futures):
            success = future.result()
            if not success:
                ok = False
What's so great about this? What's great is that I hadn't known about concurrent.futures or ProcessPoolExecutor. And while I might have suspected that something like them existed, I didn't know what they were called. But now I do know about them.

If someone had asked me to write the --parallel option, I would have had to have this conversation with myself:

Python probably has something like this already. But how long will it take me to track it down? And once I do, will the API documentation be any good, or will it be spotty and incorrect? And will there be only one module, or will there be three and I will have to pick the right one? And having picked module F6, will I find out an hour later that F6 is old and unmaintained and that people will tell me “Oh, you should have used A1, it is the new hotness, everyone knows that.”

When I put all that uncertainty on a balance, and weigh it against the known costs of doing it myself, which one wins?

The right choice is: I should do the research, find the good module (A1, not F6), and figure out how to use it.

But one of my biggest weaknesses as a programmer is that I too often make the wrong choice in this situation. I think “oh, I've done this before, it will be quicker to just do it myself”, and then I do and it is.

Let me repeat, it is quicker to do it myself. But that is still the wrong choice.

Maybe the thing I wrote would be sooner or smaller or faster or more technically suitable to the project than the canned module would have been. But it would only have been more be technically suitable today. If it needed a new feature in the future it might have to be changed by someone who had never seen it before, whereas the canned module could well already have the needed feature ready to go, already documented, and perhaps already familiar to whoever had to make the change. My bespoke version would certainly be unfamiliar to every other programmer on the project — including perhaps myself six months later — so would be the wrong thing to use.

I'm really good at hacking this stuff up. Which is a problem. It makes me want to hack stuff up, even when I shouldn't.

Claude tips the balance strongly toward the correct side, which is that I should use the prepackaged module that someone else wrote and not hack something up.

And now I know about concurrent.futures.ProcessPoolExecutor! The world is full of manuals, how can I decide which ones I should read? One way is to read them all, which used to work back when I was younger, but now I have more responsibilities and I don't have time to read the entire Python library reference including all the useless bits I will never use. But here's Claude pointing out to me that this is something I need to know about, now, today, and I should read this one. That is valuable knowledge.

Claude flubs

While testing the program I did
    xar --parallel=2 bash -c '...'
which didn't work because the argparse library tried to parse the -c as a flag to xar, when I wanted it to be a flag to bash. This was my fault, and there is a standard way around it, which is that I should have said
    xar --parallel=2 -- bash -c '...'
which tells argparse to drop the -- and take everything after it literally with no interpretation.

I thought I'd see what Claude had to say about this, and it dropped the ball in a very LLM sort of way.

I asked:

When I did xar --parallel=2 bash -c '...' it misunderstood the arguments. It thought I was trying to pass -c to xar and failed with unrecognized arguments. What should I do about that?

Claude replied:

There are two main approaches to handle this:

Use the -- separator to indicate the end of arguments for xar: …

Modify your code to handle this case specifically by looking for a command terminator in the argument parser.

Here's how you could modify your code to implement the first approach:

And then it followed with the same, unmodified code…

Which was correct, because -- is already handled by argparse, so no change was actually required to implement the first approach! But Claude's characterization of it as a modification was misleading, wrong, and potentially very confusing.

I said:

I don't see what the difference is in that argument parsing code as compared to what I had before.

and Claude got into the LLM tail-chasing mode where it got more and more confused:
You're right, I apologize for the confusion. There's no difference in the argument parsing code I provided compared to what you had before. The issue is with how argparse handles arguments by default.

To fix the issue with xar --parallel=2 bash -c '...', you need to actually implement support for the -- separator in your code…
   # Find the index of '--' if it exists
   try:
       separator_index = argv.index('--')
       xar_args = argv[:separator_index]
       command_args = argv[separator_index+1:]
   except ValueError: …
Ugh, all wrong in the worst way. The code probably works, but it is completely unnecessary. Claude's claim that “you need to actually implement support for the -- separator” is flat wrong. I pointed this out and Claude got more confused. Oh well, nobody is perfect!

Lessons learned

A long time ago, when syntax-coloring editors were still new, I tried one and didn't like it, then tried again a few years later and discovered that I liked it better than I had before, and not for the reasons that anyone had predicted or that I would have been able to predict. (I wrote an article about the surprising reasons to use the syntax coloring.)

This time also. As usual, an actual experiment produced unexpected results, because the world is complicated and interesting. Some of the results were unsurprising, but some were not anything I would have thought of beforehand.

Claude's code is good enough, but it is not a magic oracle

Getting Claude to write most of the code was a lot faster and easier than writing it myself. This is good! But I was dangerously tempted to just take Claude's code at face value instead of checking it carefully. I quickly got used to flying along at great speed, and it was tough to force myself to slow down and be methodical, looking over everything as carefully as I would if Claude were a real junior programmer. It would be easy for me to lapse into bad habits, especially if I were tired or ill. I will have to be wary.

Fortunately there is already a part of my brain trained to deal with bright kids who lack experience, and I think perhaps that part of my brain will be able to deal effectively with Claude.

I did not notice any mistakes on Claude's part — at least this time.

At one point my testing turned up what appeared to be a bug, but it was not. The testing was still time well-spent.

Claude remembers the manual better than I do

Having Claude remember stuff for me, instead of rummaging the manual, is great. Having Claude stub out an argument parser, instead of copying one from somewhere else, was pure win.

Partway along I was writing a test script and I wanted to use that Bash flag that tells Bash to quit early if any of the subcommands fails. I can never remember what that flag is called. Normally I would have hunted for it in one of my own shell scripts, or groveled over the 378 options in the bash manual. This time I just asked in plain English “What's the bash option that tells the script to abort if a command fails?” Claude told me, and we went back to what we were doing.

Claude can talk about code with me, at least small pieces

Claude easily does simple refactors. At least at this scale, it got them right. I was not expecting this to work as well as it did.

When I told Claude to stop commenting every line, it did. I wonder, if I had told it to use if not expr only for Boolean expressions, would it have complied? Perhaps, at least for a while.

When Claude wrote code I wasn't sure about, I asked it what it was doing and at least once it explained correctly. Claude had written
     parser.add_argument(
        "-p", "--parallel",
        nargs="?",
        const=5,
        type=int,
        default=1,
        help="Run up to N commands in parallel (default: 5)"
    )
Wait, I said, I know what the const=5 is doing, that's so that if you have --parallel with no number it defaults to 5. But what is the --default doing here? I just asked Claude and it told me: that's used if there is no --parallel flag at all.

This was much easier than it would have been for me to pick over the argparse manual to figure out how to do this in the first place.

More thoughts

On a different project, Claude might have done much worse. It might have given wrong explanations, or written wrong code. I think that's okay though. When I work with human programmers, they give wrong explanations and write wrong code all the time. I'm used to it.

I don't know how well it will work for larger systems. Possibly pretty well if I can keep the project sufficiently modular that it doesn't get confused about cross-module interactions. But if the criticism is “that LLM stuff doesn't work unless you keep the code extremely modular” that's not much of a criticism. We all need more encouragement to keep the code modular.

Programmers often write closely-coupled modules knowing that it is bad and it will cause maintenance headaches down the line, knowing that the problems will most likely be someone else's to deal with. But what if writing closely-coupled modules had an immediate cost today, the cost being that the LLM would be less helpful and more likely to mess up today's code? Maybe programmers would be more careful about letting that happen!

Will my programming skill atrophy?

Folks at Recurse Center were discussing this question.

I don't think it will. It will only atrophy if I let it. And I have a pretty good track record of not letting it. The essence of engineering is to pay attention to what I am doing and why, to try to produce a solid product that satisifes complex constraints, to try to spot problems and correct them. I am not going to stop doing this. Perhaps the problems will be different ones than they were before. That is all right.

Starting decades ago I have repeatedly told people

You cannot just paste code with no understanding of what is going on and expect it to work.

That was true then without Claude and it is true now with Claude. Why would I change my mind about this? How could Claude change it?

Will I lose anything from having Claude write that complex parser.add_argument call for me? Perhaps if I had figured it out on my own, on future occasions I would have remembered the const=5 and default=1 specifications and how they interacted. Perhaps.

But I suspect that I have figured it out on my own in the past, more than once, and it didn't stick. I am happy with how it went this time. After I got Claude's explanation, I checked its claimed behavior pretty carefully with a stub program, as if I had been reviewing a colleague's code that I wasn't sure about.

The biggest win Claude gave me was that I didn't know about this ProcessPoolExecutor thing before, and now I do. That is going to make me a better programmer. Now I know something about useful that I didn't know before, and I have a pointer to documentation I know I should study.

My skill at writing ad-hoc process pool managers might atrophy, but if it does, that is good. I have already written too many ad-hoc process pool managers. It was a bad habit, I should have stopped long ago, and this will help me stop.

Conclusion

This works.

Perfectly? No, it's technology, technology never works perfectly. Have you ever used a computer?

Will it introduce new problems? Probably, it's new technology, and new technology always introduces new problems.

But is it better than what we had before? Definitely.

I still see some programmers turning up their noses at this technology as if they were sure it was a silly fad that would burn itself out once people came to their senses and saw what a terrible idea it was.

I think that is not going to happen, and those nose-turning-up people, like the people who pointed out all the drawbacks and unknown-unknowns of automobiles as compared to horse-drawn wagons, are going to look increasingly foolish.

Because it works.
by Mark Dominus ([email protected]) at May 03, 2025 04:46 PM

A puzzle about balancing test tubes in a centrifuge

$\def\nk#1#2{\left\langle{#1 \atop #2}\right\rangle} \def\dd#1{\nk{12}{#1}}$

Suppose a centrifuge has $n$ slots, arranged in a circle around the center, and we have $k$ test tubes we wish to place into the slots. If the tubes are not arranged symmetrically around the center, the centrifuge will explode.

(By "arranged symmetrically around the center, I mean that if the center is at $(0,0)$ , then the sum of the positions of the tubes must also be at $(0,0)$ .)

Let's consider the example of $n=12$ . Clearly we can arrange $2$ , $3$ , $4$ , or $6$ tubes symmetrically:

$twelve small circles arranged around a central point, like a clock; circles 5 and 11 are filled in$ $cirles 1, 5, and 9 are filled in$ $circles 0, 3, 6, and 9 are filled$ $circles 1, 3, 5, 7, 9, and 11 are filled$

Equally clearly we can't arrange only $1$ . Also it's easy to see we can do $k$ tubes if and only if we can also do $n-k$ tubes, which rules out $n=12, k=11$ .

From now on I will write $\nk nk$ to mean the problem of balancing $k$ tubes in a centrifuge with $n$ slots. So $\dd 2, \dd 3, \dd 4,$ and $\dd 6$ are possible, and $\dd 1$ and $\dd{11}$ are not. And $\nk nk$ is solvable if and only if $\nk n{n-k}$ is.

It's perhaps a little surprising that $\dd7$ is possible. If you just ask this to someone out of nowhere they might have a happy inspiration: “Oh, I'll just combine the solutions for $\dd3$ and $\dd4$ , easy.” But that doesn't work because two groups of the form $3i+j$ and $4i+j$ always overlap.

For example, if your group of $4$ is the slots $0, 3, 6, 9$ then you can't also have your group of $3$ be $1, 5, 9$ , because slot $9$ already has a tube in it.

$The k=3 and k=4 diagrams from before, superimposed; there is a small explosion symbol at 9 o'clock where they interfere$

The other balanced groups of $3$ are blocked in the same way. You cannot solve the puzzle with $7=3+4$ ; you have to do $7=3+2+2$ as below left. The best way to approach this is to do $\dd5$ , as below right. This is easy, since the triangle only blocks three of the six symmetric pairs. Then you replace the holes with tubes and the tubes with holes to turn $\dd5$ into $\dd{12-5}=\dd7$ .

$a triangle filling slots 1, 5, and 9, plus a pair at 0, 6 and another pair at 2, 8$ $the opposite of the previous diagram, consisting of a triangle filling slots 3, 7, 11 and a pair at 4, 10$

Given $n$ and $k$ , how can we decide whether the centrifuge can be safely packed?

Clearly you can solve $\nk nk$ when $n$ is a multiple of $k>1$ , but the example of $\dd5$ (or $\dd7$ ) shows this isn't a necessary condition.

A generalization of this is that $\nk nk$ is always solvable if $\gcd(n,k) > 1$ since you can easily balance $g = \gcd(n, k)$ tubes at positions $0, \frac ng, \frac{2n}g, \dots, \frac {(g-1)n}g$ , then do another $g$ tubes one position over, and so on. For example, to do $\dd8$ you just put first four tubes in slots $0, 3, 6, 9$ and the next four one position over, in slots $1, 4, 7, 10$ .

$Two squares this time, as described in the previous paragraph$

An interesting counterexample is that the strategy for $\dd7$ , where we did $7=3+2+2$ , cannot be extended to $\nk{14}9$ . One would want to do $k=7+2$ , but there is no way to arrange the tubes so that the group of $2$ doesn't conflict with the group of $7$ , which blocks one slot from every pair.

But we can see that this must be true without even considering the geometry. $\nk{14}9$ is the reverse of $\nk{14}{14-9} = \nk{14}5$ , which impossible: the only nontrivial divisors of $n=14$ are $2$ and $7$ , so $k$ must be a sum of $2$ s and $7$ s, and $5$ is not.

You can't fit $k=3+5=8$ tubes when $n=15$ , but again the reason is a bit tricky. When I looked at $8$ directly, I did a case analysis to make sure that the $3$ -group and the $5$ -group would always conflict. But again there was an easier was to see this: $8=15-7$ and $7$ clearly won't work, as $7$ is not a sum of $3$ s and $5$ s. I wonder if there's an example where both $k$ and $n-k$ are not obvious?

For $n=20$ , every $k$ works except $k=3,17$ and the always-impossible $k=1,19$ .

What's the answer in general? I don't know.

Addenda

20250502

Now I am amusing myself thinking about the perversity of a centrifuge with a prime number of slots, say $13$ . If you use it at all, you must fill every slot. I hope you like explosions!

While I did not explode any centrifuges in university chemistry, I did once explode an expensive Liebig condenser.

$A chemistry lab apparatus. In the middle thje the Liebig condenser, a pair of long concentric glass tubes, one inside the other, supported by a metal ring stand. At left a heater heats a flash whose top is connected to the upper end of the inner tube of the condenser. The condensate collects in a flask at right. Two rubber tubes connect to the top and bottom of the outer tube, carrying water through it.$

Condenser setup by Mario Link from an original image by Arlen on Flickr. Licensed cc-by-2.0, provided via Wikimedia Commons.

20250503

Michael Lugo informs me that a complete solution may be found on Matt Baker's math blog. I have not yet looked at this myself.

Omar Antolín points out an important consideration I missed: it may be necessary to subtract polygons. Consider $\nk{30}6$ . This is obviously possible since $6\mid 30$ . But there is a more interesting solution. We can add the pentagon $\{0, 6, 12, 18, 24\}$ to the digons $\{5, 20\}$ and $\{10, 25\}$ to obtain the solution $${0,5,6,10,12,18, 20, 24, 25}.$$

Then from this we can subtract the triangle $\{0, 10, 20\}$ to obtain $${5, 6, 12, 18, 24, 25},$$ a solution to $\nk{30}6$ which is not a sum of regular polygons:

Thanks to Dave Long for pointing out a small but significant error, which I have corrected.

20250505

Robin Houston points out this video, The centrifuge Problem with Holly Krieger, on the Numberphile channel.

by Mark Dominus ([email protected]) at May 03, 2025 03:25 PM

GHC Developer Blog

GHC 9.10.2 is now available

GHC 9.10.2 is now available

Zubin Duggal - 2025-05-03

The GHC developers are very pleased to announce the availability of the final release for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes, including:

Significantly improved performance when dynamically loading Haskell symbols (#23415).

Fixing a bug where the simplifier sometimes destroyed join points during float out, which could impact performance (#24768).

Reduced memory fragmentation in the non-moving GC’s segment allocator, improving resident set size by up to 26% for some applications (#24150).

Added new flags to control speculative evaluation (-fspec-eval and -fspec-eval-dictfun) to work around performance regressions (#25606).

Fixed several platform-specific issues, including segfaults with FFI on PowerPC (#23034) and improved code generation for AArch64 with multiway branches now using jump tables (#19912)

And many more!

A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at May 03, 2025 12:00 AM

May 02, 2025

Gabriella Gonzalez

Prompt chaining reimagined with type inference
Prompt chaining reimagined with type inference
At work I’ve been researching how to improve the ergonomics of prompt engineering and I wanted to share and open source some of what I’ve done. This initial post is about how I’ve been experimenting with using bidirectional type inference to streamline prompt chaining.

“Prompt chaining” is a prompt engineering technique that splits a larger task/prompt into multiple smaller tasks/prompts which are chained together using code. For example, instead of prompting a model to generate a poem in one prompt like this:

Write a poem based off this idea:

${idea}

… by following this process:

First think through the form, stanza count, lines per stanza, and rhyme scheme

Then choose a poetic style (tone, voice, and literary devices) based on the poem’s form

Then write a complete poem based on that plan

… you can split it into smaller prompts, like this:

structure prompt:

Plan the structure of a new poem based on this idea

${idea}

Describe its form, stanza count, lines per stanza, and rhyme scheme

style prompt:

Given this poem structure:

Form: ${structure.form}

Stanzas: ${structure.stanzaCount}

Lines per stanza: ${structure.linesPerStanza}

Rhyme scheme: ${structure.rhymeScheme}

Choose a poetic style: tone, voice, and literary devices to emphasize

poem prompt:

Write a complete poem based on this idea:

${idea}

Structure:

Form: ${structure.form}

Stanzas: ${structure.stanzaCount}

Lines per stanza: ${structure.linesPerStanza}

Rhyme scheme: ${structure.rhymeScheme}

Style:

Tone: ${style.stone}

Voice: ${style.voice}

Literary Devices: ${style.literaryDevices}

Why might you want to do this?

to improve the quality of the results

Models perform better when working on more constrained subproblems. Splitting a larger prompt into smaller prompts helps the model stay focused at each step.

to introspect intermediate results

This comes in handy when you want to log, validate, or correct intermediate results.

to perform actions in between prompts

You might want to take the output of one prompt, use that to call some tool, then use the output of that tool to decide what the next prompt should be, which you can’t do with a single prompt.

In other words, prompt chaining unlocks greater accuracy, control, and flexibility for prompt engineering.

The problem

The main issue with prompt chaining is that it is a huge pain in the ass; if you start do anything a little bit complicated you need to start using structured outputs (i.e. JSON), which adds a whole lot of boilerplate to the process:

you have to define the schema for each intermediate step of the process

You typically do this by defining your data model in your host programming language (e.g. a Pydantic model in Python) or directly defining your JSON schema

You have to instruct the model to produce JSON and explain the shape of the expected output

(Depending on the framework) you have to decode the JSON into your data model

For small prompt chaining pipelines this isn’t too hard, but it starts to get annoying to define all these schemas when you scale this up to more sophisticated prompt chaining pipelines.

So as a thought experiment I wanted to create a research prototype that handled all of that for you so that you didn’t need to specify any schemas at all. In other words I wanted to build a programming language that harnessed bidirectional type inference to perform schema inference for prompts with structured JSON outputs.

Example

I’ll cut to the case by showing the above prompt chain written as a program in this language:
let concatSep =
      https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let lines = concatSep "\n"

let generatePoem idea =
        let structure = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Plan the structure of a new poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
                    ]
                }

        let renderedStructure = lines
                [ "- Form: " + structure.form
                , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
                , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
                , "- Rhyme scheme: " + structure.rhymeScheme
                ]

        let style = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Given this poem structure:"
                    , renderedStructure
                    , ""
                    , "Choose a poetic style: tone, voice, and literary devices to emphasize."
                    ]
                }

        let renderedStyle = lines
                [ "- Tone: " + style.tone
                , "- Voice: " + style.voice
                , "- Literary Devices: " + concatSep ", " style.literaryDevices
                ]

        let poem : Text = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Write a complete poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Structure:"
                    , renderedStructure
                    , ""
                    , "Style:"
                    , renderedStyle
                    ]
                }

        in { structure: structure, style: style, poem: poem }

in  generatePoem "The stillness of early morning"
The above prompt chain is a real program in a research programming language named Fall-from-Grace (or “Grace” for short). Specifically, this is implemented in a branch of the language that adds a magic prompt keyword that works like in the above example.

If you run the above example, you might get an output like this:
{
  "structure": {
    "form": "Free verse with a sprinkle of rhyme",
    "linesPerStanza": 4,
    "rhymeScheme": "ABCB",
    "stanzaCount": 5
  },
  "style": {
    "literaryDevices": [
      "imagery",
      "alliteration",
      "personification",
      "enjambment"
    ],
    "tone": "nostalgic",
    "voice": "reflective"
  },
  "poem": "### The Stillness of Early Morning\n\n**I**  \nIn the gentle rise of dawn's embrace,  \nA whisper of winds through sleepy skies,  \nThe world lies suspended in an echo  \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II**  \nSoft orange light graces fields of dew,  \nCradling tenacious tendrils of the past.  \nMorning breathes upon the waiting world  \nWith promises that this serenity will last.\n\n**III**  \nWaves of silence surge across the meadow,  \nNature's symphony gently cradles rest.  \nA chorus of leaves sings its soft refrain,  \nComforted by morning's tender caress.\n\n**IV**  \nThe silence dips to caress weary trees,  \nWhose branches waltz to whispers soft and slow.  \nEach tendril holds close the morning's intention,  \nBefore the day's demands, the chaos, start to show.\n\n**V**  \nEarly rays dance upon a tranquil sea,  \nA canvas painted through a dreamy haze.  \nTime sits by, savoring the gentle hush,  \nAs the world awakens, stirs, and finally plays."
}
… and if you’re curious what the formatted poem looks like:

The Stillness of Early Morning

I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.

II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.

III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.

IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.

V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.

Type inference

The sample Grace program hardly specifies any types (mainly the final expected type for the poem: Text). The reason this works is because Grace supports bidirectional type inference, which means that Grace can work backwards from how intermediate results are used to infer their schemas.

I’ll illustrate this with a contrived Grace example:
let numbers = prompt{ text: "Give me two numbers" }

in  { x: numbers.x
    , y: numbers.y
    , sum: numbers.x + numbers.y : Integer
    }
… which might produce an output like this:
$ grace interpret ./numbers.ffg
{ "x": 7, "y": 14, "sum": 21 }
When Grace analyzes this program the type checker works backwards from this expression:
numbers.x + numbers.y : Integer
… and reasons about it like this:

the addition produces an Integer, therefore numbers.x and numbers.y must also be Integers

therefore numbers is a record with two fields, x and y, both of which are Integers

… or using Grace syntax, the inferred type of numbers is: { x: Integer, y: Integer }

therefore the output of the prompt command must have the same type

… and then Grace generates a JSON schema for the prompt which looks like this:
{ "type": "object",
  "properties": {
    "x": { "type": "integer" },
    "y": { "type": "integer" }
  },
  "required": [ "x", "y" ],
  "additionalProperties": false
}
Of course, you can specify types if you want (and they’re more lightweight than schemas in traditional prompt chaining frameworks). For example:
$ grace repl
>>> prompt{ text: "Give me a first and last name" } : { first: Text, last: Text }
{ "first": "Emily", "last": "Johnson" }
>>> prompt{ text: "Give me a list of names" } : List Text
[ "Alice"
, "Bob"
, "Charlie"
, "Diana"
, "Ethan"
, "Fiona"
, "George"
, "Hannah"
, "Isaac"
, "Jack"
]
However in our original example we don’t need to specify intermediate types because when the type-checker sees this code:
let structure = prompt
        { model: "gpt-4o"
        , text: lines
            [ "Plan the structure of a new poem based on this idea:"
            , ""
            , idea
            , ""
            , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
            ]
        }

let renderedStructure = lines
        [ "- Form: " + structure.form
        , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
        , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
        , "- Rhyme scheme: " + structure.rhymeScheme
        ]
… the compiler can reason backwards from how the structure value is used to infer that the JSON schema for the prompt needs to be:
{ "type": "object",
  "properties": {
    "form": { "type": "string" },
    "stanzaCount": { "type": "integer" },
    "linesPerStanza": { "type": "integer" },
    "rhymeScheme": { "type": "string" }
  },
  "required": [
    "form",
    "stanzaCount",
    "linesPerStanza",
    "rhymeScheme"
    ],
  "additionalProperties": false
}
Tool use

Grace also supports generating sum types (a.k.a. tagged unions), and you can imagine using this to subsume traditional tool use frameworks.

For example, consider this Grace program:
let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let call = merge
      { HttpRequest: \x -> "curl " + x.url
      , ShellCommand: \x -> concatSep " " ([ x.executable ] + x.arguments)
      }

in  List/map call (prompt{ text: "Call some tools" })
This doesn’t actually run any tools (I haven’t added any callable tools to my work-in-progress branch yet), but just renders the tool use as a string for now:
$ grace interpret ./tools.ffg
[ "curl https://api.example.com/data", "ls -l -a" ]
However, the idea is that you can model a tool as a sum type with one constructor per callable tool, and in the above example the type checker infers that the sum type representing one tool call is:
< HttpRequest: { url: Text }
| ShellCommand: { executable: Text, arguments: List Text }
>
… so the inferred type of call is:
call : < HttpRequest: …, ShellCommand: … > -> Text
… but since we List/map the call function over the output of the prompt the type checker infers that the prompt needs to generate a List of tool calls:
prompt{ text: "Call some tools" } : List < HttpRequest: …, ShellCommand: … >
… and then Grace does some magic under the hood to convert that type to the equivalent JSON schema.

What’s particularly neat about this example is that the prompt is so incredibly bare (“Call some tools”) because all the information the model needs is present in the schema.

Schema-driven prompting

We can explore this idea of using the schema to drive the prompt instead of prose using an example like this:
prompt{ text: "Generate some characters for a story", model: "gpt-4o" }
  : List
    { "The character's name": Text
    , "The most memorable thing about the character": Text
    , "The character's personal arc": Text
    }
[ { "The character's name": "Aveline Thatcher"
  , "The character's personal arc":
      "Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures."
  , "The most memorable thing about the character":
      "The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."
  }
, { "The character's name": "Kelan Frost"
  , "The character's personal arc":
      "A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village."
  , "The most memorable thing about the character":
      "His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."
  }
, { "The character's name": "Luciana Blair"
  , "The character's personal arc":
      "Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage."
  , "The most memorable thing about the character":
      "Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."
  }
, { "The character's name": "Ezra Hartman"
  , "The character's personal arc":
      "Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice."
  , "The most memorable thing about the character":
      "His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."
  }
, { "The character's name": "Seraphine Mora"
  , "The character's personal arc":
      "Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses."
  , "The most memorable thing about the character":
      "The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."
  }
]
Grace is a superset of JSON and since JSON supports arbitrary field names so does Grace! Field names in Grace support arbitrary capitalization, punctuation, and whitespace as long as you quote them, and we can use the field names to “smuggle” the description of each field into the schema.

Conclusion

Hopefully this gives you some idea of why I’ve begun to think of prompt chaining as a programming languages problem. Type inference is just the beginning and I think it is possible to use a domain-specific programming language not just to simplify the code but to ultimately unlock greater reasoning power.

I’m going to continue to use Grace as a research vehicle for prompt chaining but my LLM-enabled branch of Grace (like Grace itself) is not really intended to be used in production and I created it mainly as a proof-of-concept for where I’d like prompt chaining frameworks to go. If I do end up eventually productionizing this research I will create a proper fork with its own name and the whole works.
by Gabriella Gonzalez ([email protected]) at May 02, 2025 05:56 PM

April 29, 2025

Dan Piponi (sigfpe)

Expectation-Maximization with Less Arbitrariness

Introduction

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:
Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?

Substituting $1=A/A$ in the middle of an expression. Again, you can use $1=A/A$ just about anywhere. Why choose this $A$ at this time? Similarly I found derivations that insert a $B-B$ into an expression.

Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.

My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.

Generalities about EM
The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.

First assume no hidden variables. We assume there is a vector of parameters $\theta=(\theta_i)$ that defines some model. We make some observations $x=(x_j)$ . We have a probability density $P(x|\theta)$ that depends on $\theta$ . The likelihood of $\theta$ given the observations $x$ is $l(\theta|x)=P(x|\theta)$ . The maximum likelhood estimator for $\theta$ is the choice of $\theta$ that maximises $l(\theta|x)$ for the $x$ we have observed.

Now suppose there are also some variables $z=(z_k)$ that we didn't get to observe. We assume a density $P(x,z|\theta)$ . We now have
$P(x|\theta)=\sum_z P(x,z|\theta)$
where we sum over all possible values of $z$ . The MLE approach says we now need to maximise
$l(\theta|x)=\sum_z P(x,z|\theta).$
One of the things that is a challenge here is that the components of $\theta$ might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of $\theta_i$ , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood
$\log l(\theta|x)=\log\sum_z P(x,z|\theta).$
Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise
$\sum_z \log P(x,z|\theta).$
One reason this would be to our advantage is that $P(x,z|\theta)$ often takes the form $\exp(f(x,z,\theta))$ where $f$ is a simple function to optimise. In addition, $f$ may break up as a sum of terms, each with its own block of $\theta_i$ 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the $P(x,z|\theta)$ for each $z$ is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.

Maximisation by proxy
Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose $f$ is a differentiable function in a neighbourhood of $x_0$ . Then around $x_0$ we have
$f(x) \approx f(x_0)+f'(x_0)\cdot (x-x_0).$
We can try optimising $f(x_0)+f'(x_0)\cdot (x-x_0)$ with respect to $x$ within a neighbourhood of $x_0$ . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing $f(x_0)+f'(x_0)\cdot (x-x_0)$ because it matches both the value and derivatives of $f$ at $x_0$ . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.

We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.

Write
$\log l(\theta|x) = \log\sum_z P(x,z|\theta) \approx \sum_z\beta_z\log P(x,z|\theta)+\mbox{constant}.$
The $\beta_z$ are constants we'll determine. We want to match the derivatives on either side of the $\approx$ at $\theta=\theta_0$ :
$\frac{\partial \log l(\theta_0|x)}{\partial\theta_0}$ $=\frac{1}{l(\theta_0|x)} \frac{\partial l(\theta_0|x)}{\partial\theta_0} =\sum_z\frac{1}{l(\theta_0|x)} \frac{\partial P(x,z|\theta_0)}{\partial\theta_0}.$
On the other hand we have
$\frac{\partial}{\partial\theta_0}\sum_z\beta_z\log P(x,z|\theta_0) =\sum_z\beta_z\frac{1}{P(x,z|\theta_0)}\frac{\partial P(x,z|\theta_0)}{\partial\theta_0}$

To achieve equality we want to make these expressions match. We choose
$\beta_z = \frac{P(x,z|\theta_0)}{l(\theta_0|x)} = \frac{P(x,z|\theta_0)}{P(x|\theta_0)} = P(z|x,\theta_0).$
Our desired proxy function is:
$\sum_z P(z|x,\theta_0)\log P(x,z|\theta)+\mbox{const.} = E_{Z|x,\theta_0}(\log P(x,Z|\theta))+\mbox{const.}$

So the procedure is to take an estimated $\theta_0$ and obtain a new estimate by optimising this proxy function with respect to $\theta$ . This is the standard EM algorithm.

It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.

Afterword
As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute
$\mbox{argmax}_x\sum_i\exp(f_i(x))$
you can iterate, at each step computing
$\mbox{argmax}_x\sum_i\exp(f_i(x_0))f_i(x)$
where $x_0$ is the previous iteration. If the $f_i$ take a convenient form then this may turn out to be much easier.

Note
This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

by sigfpe ([email protected]) at April 29, 2025 09:12 PM

Running from the past
Important Note

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

Preface
Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)

Markov Chains
A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence $X_0, X_1, X_2, \ldots$ such that $P(X_{i+1}=x|X_0,X_1,\ldots,X_i) = P(X_{i+1}=x|X_i)$ for all $i\ge0$ . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)

Like imperative code, this description is stateful. The state at step $i+1$ depends on the state at step $i$ . Probability is often easier to reason about when we work with independent identically drawn random variables and our $X_i$ aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.

Let's choose a Markov chain to play with. I'll pick one with 3 states called $A$ , $B$ and $C$ and with transition probabilities given by $P(X_{i+1}=y|X_i=x)=T_{xy}$ where $T=\begin{pmatrix} \frac{1}{2}& \frac{1}{2}& 0\\ \frac{1}{3}& \frac{1}{3}& \frac{1}{3}\\ 0& \frac{1}{2}& \frac{1}{2}\\ \end{pmatrix}$

Here's a diagram illustrating our states:
Implementation
First some imports:
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA)
> import System.Random
> import Control.Monad.State
> import Control.Monad
> import Data.List
> import Data.Array
And now the type of our random variable:
> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)
We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:
> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random
And now the code to take a single step in the Markov chain:
> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C
Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.

We can simulate the effect of taking $n$ steps with a function like this:
> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i
We can run for 100 steps, starting with , with a line like so:
*Main> evalState (steps 3 A) gen
B
The starting state of our random number generator is given by gen.

Consider the distribution of states after taking $n$ steps. For Markov chains of this type, we know that as $n$ goes to infinity the distribution of the $n$ th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large $n$ and hope it's large enough. As $n$ gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.

Some random functions

Here is a new version of our random step function:
> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C
In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:
> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs
Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)
*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]
When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!

Here's the code:
> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs
We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.

Try it and see:
*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]
Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.
*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]
In other words, for large enough we get the constant function.

Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take $n$ . If we can detect this then we've found the limit of $f_{0}$ $\circ$ $f_{1}$ $\ldots$ $\circ$ $f_{n-1}$ as $n$ goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.

We can code this up:
> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool
> test_constant f =
>     all_equal $ map f $ enumFromTo minBound maxBound
This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.
> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')
We can now sample from the limiting distribution a million times, say:
*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen
We can now count how often A appears:
*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748
That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.
> gen = mkStdGen 669
Notes
The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.
by sigfpe ([email protected]) at April 29, 2025 09:10 PM

Constructing Clifford Algebras using the Super Tensor Product
Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

Some literate Haskell but little about this code is specific to Haskell...
> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits
Introduction
This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.

In those articles I showed how you could build up the Clifford algebras like so:
type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...
I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.

The core principle making this work is that for type constructors $T$ implemented like Matrix, Quaternion etc. we have the property that

$TU\mathbb{R} = T\mathbb{R}\otimes U\mathbb{R}$

eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float $\otimes$ Quaternion Float.

But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:
> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4
...
Or even better:
> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))
But there's one little catch. We have to work, not with the tensor product, but the super tensor product.

We define Complex the same way as before:
> data Complex a = C a a deriving (Eq, Show)
Previously we used a definition of multiplication like this:
instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c) 
We can think of C a b in Complex R as representing the element . The definition of multiplication in a tensor product of algebras is defined by . So we have .

This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.

Let's go Super!
A superalgebra is an algebra graded by $\mathbb{Z}_2$ where $\mathbb{Z}_2$ is the ring of integers modulo 2. What that means is that we have some algebra $A$ that can be broken down as a direct sum $A_0\oplus A_1$ (the subscripts live in $\mathbb{Z}_2$ ) with the property that multiplication respects the grading, ie. if $x$ is in $A_i$ and $y$ is in $A_j$ then $xy$ is in $A_{i+j}$ .

The elements of $A_0$ are called "even" (or bosonic) and those in $A_1$ "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)

The super tensor product is much like the tensor product but it respects the grading. This means that if $x$ is in $A_i$ and $y$ is in $B_j$ then $x\otimes y$ is in $(A\otimes B)_{i+j}$ . From now on I'm using $\otimes$ to mean super tensor product.

Multiplication in the super tensor product of two superalgebras $A$ and $B$ is now defined by the following modified rule: if $b$ is in $B_i$ and $c$ is in $A_j$ then $(a\otimes b)(c\otimes d) = (-1)^{ij}(ac)\otimes(bd)$ . Note that the sign flip arises when we shuffle an odd $c$ left past an odd $b$ .

The neat fact that John pointed out to me is that $\mathrm{Cliff_n}=\mathbb{C}\otimes\mathbb{C}\otimes\ldots\mbox{ \mathrm{n times} }\ldots\otimes\mathbb{C}$ .

We have to modify our definition of * to take into account that sign flip.

I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.

So I could define SuperAlgebra like so:
class SuperAlgebra a where
  conjugation :: a -> a
where conjugation is the negation of the odd part.

(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)

But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.
> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a
Here's our first instance:
> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x
This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.

And now comes the first bit of non-trivial superalgebra:
> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)
We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to . But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.

Now we can use the modified rule for $\mathbb{C}\otimes$ defined above:
> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined
For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.

That's it!

Tests
I'll test it with some examples from Cliff3:
> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where
>   e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where
>   e 0 = C 0 1
>   e n = C (e (n - 1)) 0


> make a b c d e f g h =
>   C (C (C a b) (C c d))
>     (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3
> e1 = e 0
> e2 = e 1
> e21 = e2 * e1
> e3 = e 2
> e31 = e3 * e1
> e32 = e3 * e2
> e321 = e3 * e2 * e1


> main = do
>     print (e1 * e1 + 1 == 0)
>     print (e31 * e31 + 1 == 0)
>     print (e3 * e3 + 1 == 0)
>     print (e21 * e21 + 1 == 0)
>     print (e2 * e2 + 1 == 0)
>     print (e32 * e32 + 1 == 0)
>     print (e321 * e321 - 1 == 0)
>     print (e3 * e2 * e1 - e321 == 0)
>     print (e2 * e1 - e21 == 0)
>     print (e3 * e1 - e31 == 0)
>     print (e3 * e2 - e32 == 0)
>     print (e21 * e32 - e31 == 0)
Observation
The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.

Efficiency
As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.

Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with $n$ .

Note also that this code translates directly into many languages.
by sigfpe ([email protected]) at April 29, 2025 09:08 PM

April 26, 2025

Gabriella Gonzalez

Unification-free ("keyword") type checking
Unification-free ("keyword") type checking
From my perspective, one of the biggest open problems in implementing programming languages is how to add a type system to the language without significantly complicating the implementation.

For example, in my tutorial Fall-from-Grace implementation the type checker logic accounts for over half of the code. In the following lines of code report I’ve highlighted the modules responsible for type-checking with a ‡:
$ cloc --by-file src/Grace/*.hs       
…
--------------------------------------------------------------------------------
File                                    blank        comment           code
--------------------------------------------------------------------------------
src/Grace/Infer.hs        ‡               499            334           1696
src/Grace/Type.hs         ‡                96             91            633
src/Grace/Syntax.hs                        61            163            543
src/Grace/Parser.hs                       166             15            477
src/Grace/Lexer.hs                         69             25            412
src/Grace/Normalize.hs                     47             48            409
src/Grace/Context.hs      ‡                72            165            249
src/Grace/Import.hs                        38              5            161
src/Grace/REPL.hs                          56              4            148
src/Grace/Interpret.hs                     30             28            114
src/Grace/Pretty.hs                        25             25            108
src/Grace/Monotype.hs     ‡                11             48             61
src/Grace/Location.hs                      16             15             60
src/Grace/TH.hs                            23             32             53
src/Grace/Value.hs                         12             53             53
src/Grace/Input.hs                         10              8             43
src/Grace/Compat.hs                         9              2             32
src/Grace/Existential.hs  ‡                12             23             25
src/Grace/Domain.hs       ‡                 4              7             20
--------------------------------------------------------------------------------
SUM:                                     1256           1091           5297
--------------------------------------------------------------------------------
That’s 2684 lines of code (≈51%) just for type-checking (and believe me: I tried very hard to simplify the type-checking code).

This is the reason why programming language implementers will be pretty keen to just not implement a type-checker for their language, and that’s how we end up with a proliferation of untyped programming languages (e.g. Godot or Nix), or ones that end up with a type system bolted on long after the fact (e.g. TypeScript or Python). You can see why someone would be pretty tempted to skip implementing a type system for their language (especially given that it’s an optional language feature) if it’s going to balloon the size of their codebase.

So I’m extremely keen on implementing a “lean” type checker that has a high power-to-weight ratio. I also believe that a compact type checker is an important foundational step for functional programming to “go viral” and displace imperative programming. This post outlines one approach to this problem that I’ve been experimenting with¹.

Unification

The thing that bloats the size of most type-checking implementations is the need to track unification variables. These variables are placeholders for storing as-yet-unknown information about something’s type.

For example, when a functional programming language infers the type of something like this Grace expression:
(λx → x) true
… the way it typically works is that it will infer the type of the function (λx → x) which will be:
λx → x : α → α
… where α is a unification variable (an unsolved type). So you can read the above type annotation as saying “the type of λx → x is a function from some unknown input type (α) to the same output type (α).

Then the type checker will infer the type of the function’s input argument (true) which will be:
true : Bool
… and finally the type checker will combine those two pieces of information and reason about the final type like this:

the input to the function (true) is a Bool

therefore the function’s input type (α) must also be Bool

therefore the function’s output type (α) must also be Bool

therefore the entire expression’s type is Bool

… which gives the following conclusion of type inference:
(λx → x) true : Bool
However, managing unification variables like α is a lot trickier than it sounds. There are multiple unification algorithms/frameworks in the wild but the problem with all of them is that you have to essentially implement a bespoke logic programming language (with all of the complexity that entails). Like, geez, I’m already implementing a programming language and I don’t want to have to implement a logic programming language on top of that just to power my type-checker.

So there are a couple of ways I’ve been brainstorming how to address this problem and one idea I had was: what if we could get rid of unification variables altogether?

Deleting unification

Alright, so this is the part of the post that requires some familiarity/experience with implementing a type-checker. If you’re somebody new to programming language theory then you can still keep reading but this is where I have to assume some prior knowledge otherwise this post will get way too long.

The basic idea is that you start from the “Complete and Easy” bidirectional type checking algorithm which is a type checking algorithm that does use unification variables² but is simpler than most type checking algorithms. The type checking rules look like this (you can just gloss over them):

Now, delete all the rules involving unification variables. Yes, all of them. That means that all of the type-checking judgments from Figures 9 and 10 are gone and also quite a few rules from Figure 11 disappear, too.

Surprisingly, you can still type check a lot of code with what’s left, but you lose two important type inference features if you do this:

you can no longer infer the types of lambda arguments

you can no longer automatically instantiate polymorphic code

… and I’ll dig into those two issues in more detail.

Inferring lambda argument types

You lose the ability to infer the type of a function like this one when you drop support for unification variables:
λx → x == False
Normally, a type checker that supports unification can infer that the above function has type Bool → Bool, but (in general) a type checker can no longer infer that when you drop unification variables from the implementation.

This loss is not too bad (in fact, it’s a pretty common trade-off proposed in the bidirectional type checking literature) because you can make up for it in a few ways (all of which are easy and efficient to implement in a type checker):
You can allow the input type to be inferred if the lambda is given an explicit type annotation, like this:
λx → x == False : Bool → Bool
More generally, you can allow the input type to be inferred if the lambda is checked against an expected type (and a type annotation is one case, but not the only case, where a lambda is checked against an expected type).

We’re going to lean on this pretty heavily because it’s pretty reasonable to ask users to provide type annotations for function definitions and also because there are many situations where we can infer the expected type of a lambda expression from its immediate context.
You can allow the user to explicitly supply the type of the argument

… like this:
λ(x : Bool) → x == False
This is how Dhall works, although it’s not as ergonomic.
You can allow the input type to be inferred if the lambda is applied to an argument

This is not that interesting, but I’m mentioning it for completeness. The reason it’s not interesting is because you won’t often see expressions of the form (λx → e) y in the wild, because they can more idiomatically be rewritten as let x = y in e.
Instantiating polymorphic code

The bigger issue with dropping support for unification variables is: all user-defined polymorphic functions now require explicit type abstraction and explicit type application, which is a major regression in the type system’s user experience.

For example, in a language with unification variables you can write the polymorphic identity function as:
λx → x
… and use it like this³:
let id = λx → x
in  (id true, id 1)
… but when you drop support for unification variables then you have to do something like this:
let id = λ(a : Type) → λ(x : a) → x
in  (id Bool true, id Natural 1)
Most programmers do NOT want to program in a language where they have to explicitly manipulate type variables in this way. In particular, they really hate explicit type application. For example, nobody wants to write:
map { x : Bool, … large record … } Bool (λr → r.x) rs
So we need to figure out some way to work around this limitation.

The trick

However, there is a solution that I believe gives a high power-to-weight ratio, which I will refer to as “keyword” type checking:

add a bunch of built-in functions

Specifically, add enough built-in functions to cover most use cases where users would need a polymorphic function.

add special type-checking rules for those built-in functions when they’re fully saturated with all of their arguments

These special-cased type-checking rules would not require unification variables.

still require explicit type abstraction when these built-in functions are not fully saturated

Alternatively, you can require that built-in polymorphic functions are fully saturated with their arguments and make it a parsing error if they’re not.

still require explicit type abstraction and explicit type application for all user-defined (i.e. non-builtin) polymorphic functions

optionally, turn these built-in functions into keywords or language constructs

I’ll give a concrete example: the map function for lists. In many functional programming languages this map function is not a built-in function; rather it’s defined within the host language as a function of the following type:
map : ∀(a b : Type) → (a → b) → List a → List b
What I’m proposing is that the map function would now become a built-in function within the language and you would now apply a special type-checking rule when the map function is fully saturated:
Γ ⊢ xs ⇒ List a   Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b
In other words, we’re essentially treating the map built-in function like a “keyword” in our language (when it’s fully saturated). Just like a keyword, it’s a built-in language feature that has special type-checking rules. Hell, you could even make it an actual keyword or language construct (e.g. a list comprehension) instead of a function call.

I would even argue that you should make each of these special-cased builtin-functions a keyword or a language construct instead of a function call (which is why I call this “keyword type checking” in the first place). When viewed through this lens the restrictions that these polymorphic built-in functions (A) are saturated with their arguments and (B) have a special type checking judgment are no different than the restrictions for ordinary keywords or language constructs (which also must be saturated with their arguments and also require special type checking judgments).

To make an analogy, in many functional programming languages the if/then/else construct has this same “keyword” status. You typically don’t implement it as a user-space function of this type:
ifThenElse : ∀(a : Type) → Bool → a → a → a
Rather, you define if as a language construct and you also add a special type-checking rule for if:
Γ ⊢ b ⇐ Bool   Γ ⊢ x ⇒ a   Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a
… and what I’m proposing is essentially greatly exploding the number of “keywords” in the implementation of the language by turning a whole bunch of commonly-used polymorphic functions into built-in functions (or keywords, or language constructs) that are given special type-checking treatment.

For example, suppose the user were to create a polymorphic function like this one:
let twice = λ(a : Type) → λ(x : a) → [ x, x ]

in  twice (List Bool) (twice Bool true)
That’s not very ergonomic to define and use, but we also can’t reasonably expect our programming language to provide a twice built-in function. However, our language could provide a generally useful replicate builtin function (like Haskell’s replicate function):
replicate : ∀(a : Type) → Natural → a → List a
… with the following type-checking judgment:
Γ ⊢ n ⇐ Natural   Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a
… and then you would tell the user to use replicate directly instead of defining their own twice function:
replicate 2 (replicate 2 true)
… and if the user were to ask you “How do I define a twice synonym for replicate 2” you would just tell them “Don’t do that. Use replicate 2 directly.”

Conclusion

This approach has the major upside that it’s much easier to implement a large number of keywords than it is to implement a unification algorithm, but there are other benefits to doing this, too!

It discourages complexity and fragmentation in user-space code

Built-in polymorphic functions have an ergonomic advantage over user-defined polymorphic functions because under this framework type inference works better for built-in functions. This creates an ergonomic incentive to stick to the “standard library” of built-in polymorphic functions, which in turn promotes an opinionated coding style across all code written in that language.

You might notice that this approach is somewhat similar in spirit to how Go handles polymorphism which is to say: it doesn’t handle user-defined polymorphic code well. For example, Go provides a few built-in language features that support polymorphism (e.g. the map data structure and for loops) but if users ask for any sort of user-defined polymorphism then the maintainers tell them they’re wrong for wanting that. The main difference here is that (unlike Go) we do actually support user-defined polymorphism; it’s not forbidden, but it is less ergonomic than sticking to the built-in utilities that support polymorphism..

It improves error messages

When you special-case the type-checking logic you can also special-case the error messages, too! With general-purpose unification the error message can often be a bit divorced from the user’s intent, but with “keyword type checking” the error message is not only more local to the problem but it can also suggest highly-specific tips or fixes appropriate for that built-in function (or keyword or language construct).

It can in some cases more closely match the expectations of imperative programmers

What I mean is: most programmers coming from an imperative and typed background are used to languages where (most of the time) polymorphism is “supported” via built-in language constructs and keywords and user-defined polymorphism might be supported but considered “fancy”. Leaning on polymorphism via keywords and language constructs would actually make them more comfortable using polymorphism instead of trying to teach them how to produce and consume user-defined polymorphic functions.

For example, in a lot of imperative languages the idiomatic solution for how to do anything with a list is “use a for loop” where you can think of a for loop as a built-in keyword that supports polymorphic code. The functional programming equivalent of “just use a for loop” would be something like “just use a list comprehension” (where a list comprehension is a “keyword” that supports polymorphic code that we can give special type checking treatment).

That said, this approach is still more brittle than unification and will require more type annotations in general. The goal here isn’t to completely recover the full power of unification but rather to get something that’s not too bad but significantly easier to implement.

I think this “keyword type checking” can potentially occupy a “low tech” point in the type checking design space for functional programming languages that need to have efficient and compact implementations (e.g. for ease of embedding). Also, this can potentially provide a stop-gap solution for novice language implementers that want some sort of a type system but they’re not willing to commit to implementing a unification-based type system.

There’s also variation on this idea which Verity Scheel has been exploring, which is to provide userland support for defining new functions with special type-checking rules and there’s a post from her outlining how to do that:

User Operators with Implicits & Overloads

The other approach is to create essentially an “ABNF for type checkers” that would let you write type-checking judgments in a standard format that could generate the corresponding type-checking code in multiple languages. That’s still a work-in-progress, though.↩︎

I believe some people might take issue with calling these unification variables because they consider bidirectional type checking as a distinct framework from unification. Moreover, in the original bidirectional type checking paper they’re called “unsolved” variables rather than unification variables. However, I feel that for the purpose of this post it’s still morally correct to refer to these unsolved variables as unification variables since their usage and complexity tradeoffs are essentially identical to unification variables in traditional unification algorithms.↩︎

… assuming let expressions are generalized.↩︎
by Gabriella Gonzalez ([email protected]) at April 26, 2025 07:49 PM

Matthew Sackman

Rallentando, and the awfulness of anti-virus software

Since I was a child, I’ve been playing the French Horn. I still play, and I take it quite seriously. I’m lucky enough to play with some good ensembles, and I perform many concerts each year.

When learning difficult music, I often practise with a metronome. A metronome is a device that clicks or beeps regularly. You can set how often it clicks; for example you might set it to click 80 times a minute. The tempo (or speed) of a piece of music is often specified by the composer telling you how many beats per minute they want. This is guidance and not sacrosanct: you don’t normally have to play at exactly this tempo, and frequently music requires some implicit variation of tempo for it to be successful. But it certainly is an important piece of information from the composer, telling you at what basic speed the piece should be going.

The problem with metronomes is that they can’t change their speed by themselves. For complex music, the speed could be changing a lot, or, maybe the number of beats per bar changes. This is annoying because it means for some pieces of music you have to keep stopping playing, readjust the metronome, and then continue on. There are also gradual changes in tempo in music: a part of a piece might gently get faster or slower. No metronome can cope with this: essentially, metronomes know nothing about the piece of music you’re playing.

So in some spare time this year, I built one that does understand music.

It does nearly all the musical things you would want it to. As this is mainly a technical blog though, here I’ll focus on those aspects.

It could have been an app; I’ve built apps before. But the faff of getting onto the play store, or the app store is just not worth it. The development tools are heavyweight and annoying. Sending out new versions requires approval processes, and you have to convince people to install something before they can use it. So I wanted this to be browser based. Also, modern web browsers are pretty amazing – great features and well designed APIs. Yep, all the old APIs are terrible and awful to work with, but everything that’s at all recent is pretty great.

Drawing music in a browser is quite a challenge though. The way I’m doing it is I’m building an SVG, client-side. This was the first thing I started on: trying to figure out how to draw music in a browser, how to be able to detect clicks, and make it all interactive. The client side code is all generated from TypeScript using the plain tsc to do the translation to JavaScript. I can’t stand complex tool-chains, and modern browsers are absolutely fine with loading modules (and you can do some really nice things with import maps as we’ll see). I’m not even minimising the JavaScript: I’ve written the server myself; the modules are sent over the wire gzipped and I have correct cache-control settings using immutable and “cache-busting”, so minimising the source just makes debugging life harder for no real gain.

A score is essentially a list of blocks. I’m using a CRDT (the fugue list CRDT) to allow local-first editing (and even offline editing). Dirty blocks get sent over a websocket and stored on the server, using LMDB which is all very normal for me.

The server has a neat part of its design: when you compile the server, all the static assets are embedded into the binary, thus making it a single self-contained executable. Now those assets (HTML, CSS, images, JavaScript etc) are just normally named files, but they can also be Go templates. When the server starts up, it works through these static assets, building HTTP routes for them. The HTTP routes contain in their paths the hashcode of the file – this is necessary for the cache busting. If the asset is a template, the server knows how to run the template, and critically, I provide a url function in the template engine so that templates can get the URL of some other asset including its hashcode. So this means that if some HTML file needs to link to some CSS file, the HTML file as built into the server can be a template. At start up, this template gets run, it can invoke this url function, and it can find out the final URL of the CSS file. And of course this URL now influences the hashcode of the HTML file itself. This also plays very nicely with the integrity attribute you can put on all sorts of things these days.

So it all works out rather nicely: if you consider the tree of file A importing files B and C, and file B imports file D, then if I make some change to file D, then it means its hashcode changes, and so its URL changes. This propagates up to file B, and from there to file A (but no change to file C). So it’s safe to serve all these static assets with immutable cache-control headers and rely on this lovely hashcode chaining. All of this work is done once, each time the server starts-up. And it’ll all explode and stop working if there’s ever a cycle in the graph of file imports.

Now in practice, it seems that references between HTML, CSS, images, or JavaScript don’t seem to create cycles – at least I’ve not had a problem so far. But between JavaScript modules, it’s much more common, as you’d likely expect. But here, import maps come to the rescue: in my TypeScript/JavaScript, I just import modules normally. I have a function in the template engine which knows how to generate an import-map of all my JavaScript modules, which gets injected into the top HTML page. This import-map provides both the rewriting of paths (to add hashcodes onto the paths), and also provides the integrity section. This solves the problem of circular imports because it means the JavaScript itself never needs to contain the hashcode of any module it imports. Yet, if I change some JavaScript module, then its hashcode changes, which means the import-map changes, and so again, the browser is forced into correctly fetching the updated resource.

A couple of weekends ago, I was up visiting my parents and I wanted to demonstrate this thing to them (they’re also musicians). They have computers running Windows. I tried loading up a demo score, and it just didn’t work. Their browsers were up to date. Nothing of note in the server logs, so I opened up the browser console and found errors from deserialisation of data coming over the websocket: it was claiming the data was corrupted. I’d never seen this in my own development and use.

Checking a few other things, and I spotted that the source HTML for the page had had some additional <script> elements added to it: something was injecting some JavaScript. And then the penny dropped: this is MITM behaviour by some shitty anti-virus software – in this case, AVG. Some quick web searching, and yep, those products are also known for dicking around with websocket traffic: if you’re sending binary messages and you’re compressing the stream, it’s apparently quite common that the anti-virus software intercepts the traffic, and then screws up the framing leading your own code to face corrupted data. Completely ridiculous.

In my case, disabling compression on the websocket was enough to prevent the corruption, and I then established that even for big scores, the initial load would be maybe 25kB of data over the websocket, so not compressing it isn’t terrible.

What made me laugh though was this: the browser console was telling me both about the corrupted data, and also about the fact the browser was refusing to run some script, due to it violating CSP settings. It took me a moment to realise that the script that wasn’t being run, was the script that the anti-virus software was injecting into my pages! Now, import-maps can’t be external files, they have to be inline in the HTML. But there’s no way I’m serving HTML pages with a CSP header with script-src 'unsafe-inline'. Instead, I’m issuing a script-src CSP header with 'self' and also the hashcode of import-map itself.

What this says to the browser is that it can trust the import-map (because hashing it will give a hashcode that matches the CSP header), the import-map itself has its integrity for every JavaScript module it needs to load, and the CSP headers also tell the browser that it’s OK to load JavaScript modules from the same domain (this is the 'self'). But, this does not give permission for the browser to run arbitrary bits of crap JavaScript that some awful anti-virus thing has injected! So, by making use of CSP and import-maps, you can defeat attackers from tampering with your website and code!

April 26, 2025 05:30 PM

April 24, 2025

Tweag I/O

The minimal megaparsec tutorial
In my functional programming course to Master Students of Telecom Nancy, I like to use parsing as an example of monadic programming, relying on the megaparsec library. My only concern with megaparsec is that its official tutorial is long: at the time Iâ€™m writing, itâ€™s 15000 words long.

Unlike the official megaparsec tutorial, this blog post is intended to be smaller, and is aimed at an audience with only a basic understanding of Haskell and monadic programming.

All the Haskell material from this blogpost is available on our GitHub: https://github.com/tweag/minimal-megaparsec-tutorial. You can fork this repository to get a full-fledged setup (including CI and Haskell Language Server support) for experimenting with a megaparsec parser ðŸš€

Running example

My running example is a parser for a domain-specific language that I designed for the class. This language uses primitive drawing commands to represent ASCII art roguelike maps. It looks like this:
HLine 0 0 8; HLine 0 4 8; VLine 0 0 5; VLine 7 0 2; VLine 7 3 2
HLine 8 1 2; HLine 8 3 2
HLine 10 0 6; HLine 10 4 6; VLine 10 0 2; VLine 10 3 2; VLine 15 0 5
Start 2 1
Cell 13 3 ~; Cell 14 3 ~; Cell 14 2 ~
Here, HLine x y len and VLine x y len draw horizontal and vertical walls respectively. The Start x y command marks the playerâ€™s starting point and Cell x y ~ places special terrain.

Roguelike maps typically consist of rectangular rooms and connecting corridors, where walls are shown as #, water as ~, and walkable spaces as dots (.) For example, the snippet above draws a map with two connected rooms. The room on the left contains the playerâ€™s start location (>), while some water appears in the lower right corner of the room on the right:
########  ######
#.>....####....#
#.............~#
#......####..~~#
########  ######
Walkable floor cells are omitted from the domain-specific language, as they can be inferred by computing the set of cells reachable from the starting point. In implementations of roguelikes, maps like this one are translated into an array of arrays of symbols, with some symbols being walkable (e.g. dot cells and water cells) and some symbols being blockers (walls). The top-level array is then used to compute possible moves and collisions.

The Parsec monad

To use megaparsec, we define our main monad type using the Parsec e s a type. It has three arguments:

The type of errors returned by the parser,

the type of stream accepted as input by the parser, and

the type of data returned upon successful parsing of an input stream.

For a simple parser, we define:

The error type to be Text, for simplicity. In a production parser, you would use a structured error type, that distinguishes the different error cases; so that you can handle them differently.

The input stream to be Text, because this is the most idiomatic choice in the Haskell ecosystem:
import Data.Text (Text)
import Text.Megaparsec

type Error = Text
type Input = Text

-- | @Parser a@ is a parser that accepts @Text@ as input and returns an @a@ upon
-- successful parsing.
type Parser a = Parsec Error Input a
Our first parser

Parsers are built from primitive combinators (e.g. lookAhead, notFollowedBy, end of file eof) and combinators derived from them (e.g. oneOf, anySingle, satisfy). These combinators are designed to consume a few symbols, not complex structures (more on this later).

Combinators return parsers in any MonadParsec monad, which means that they have a signature where the head is MonadParsec e s m => ... and the return type is of the form m a ¹. In our context, it suffices to know that m a is instantiated to Parser a, so we can use these combinators for our parsers.

Letâ€™s parse the different kinds of symbols we usually find in ASCII art roguelike maps, using the anySingle function, which parses a single token. In our case, since the input type is Text, the type of tokens is Char (see the ShareInput case of Streamâ€™s documentation, as well as the instances of Stream):
-- | A symbol in the map of an ASCII roguelike
data Symbol
  = -- | A wall, depicted by a # character
    Wall
  | -- | A water cell, depicted by a ~ character
    Water
  deriving (Eq, Show)

-- | A parser for the symbol of a single cell. Used in 'parseElement' below.
parseSymbol :: Parser Symbol
parseSymbol = do
  c <- anySingle
  case c of
    '#' -> return Wall
    '~' -> return Water
    _   -> fail $ "Unknown symbol: " <> [c] -- See below for how to avoid this case altogether (in parseLineElement)
Parser combinators

By virtue of MonadParsecs being monads, parsers can be built using functions that are common in monadic Haskell code (including functions from Functor, Applicative, etc.). Letâ€™s demonstrate this to build a parser for more advanced roguelike map constructs:
data Element
  = -- | Horizontal wall, starting at @(x,y)@ with @length@ cells (ending at @(x+length-1,y)@)
    HorizontalLine Int Int Int
  | -- | Vertical wall, starting at @(x,y)@ with @length@ cells (ending at @(x,y+length-1)@)
    VerticalLine Int Int Int
  | -- | A cell at @(x,y)@ with a symbol
    Cell Int Int Symbol
  | -- | The starting point of the player
    Start Int Int
  deriving (Eq, Show)
The parser for the HorizontalLine and VerticalLine cases can be written as follows:
import Control.Monad (void)
import Control.Monad.Extra (when)
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

parseLineElement :: Parser Element
parseLineElement = do
  constructor <- choice [string "HLine" >> return HorizontalLine, string "VLine" >> return VerticalLine]
  space1 -- One or more space
  x <- decimal
  space1
  y <- decimal
  space1
  len <- decimal
  when (len < 1) $ fail $ "Length must be greater than 0, but got " <> show len
  return $ constructor x y len
The first two lines either parse the string HLine or the string VLine and use the choice function to encode the two possibilities. Also, because each line in a do block encodes a step in the computation, writing monadic parsers is natural: each line consumes some of the input, until enough is consumed to return the desired value. Another example of using a regular monadic function is to use when to stop parsing when an incorrect value is consumed.

Running parsers

Since our parser takes Text as input, it can be tested in a pure context. Megaparsec provides the runParser function for this. To be able to print errors of our parser, our error type must be an instance of ShowErrorComponent; and then we can define a convenient runMyParser function that returns either an error or the parsed value:
import Data.Text (pack, unpack)

-- | Instance required for 'runMyParser'
instance ShowErrorComponent Error where
  showErrorComponent = unpack

-- | A variant of megaparsec's 'runParser', instantiated to our context.
-- Successfully parses an @a@ or returns an error message.
runMyParser :: Parser a -> Input -> Either Text a
runMyParser parser input =
  case runParser parser "" input of
    Left err -> Left $ pack $ errorBundlePretty err
    Right x  -> Right x
Parsing expressions, lists, etc.

Megaparsec not only provides building blocks for parsing tokens and combining parsers. It also provides parsers for common constructs found in programming languages and domain-specific languages, such as expressions and lists. Megaparsec does this by relying on the parser-combinators package.

I donâ€™t want to go into the details of parsing expressions here (e.g. parsing 1 + 2 - 3â€¦), but let me emphasize that it is a bad idea to write your own expression parser. Instead, think about what kind of operators you need and encode them, using the Operator type.

List parsing, on the other hand, is done with various sepâ€¦ functions. In our case of roguelike maps, we allow different elements to be separated by a semicolon, or by one or more newlines. This is encoded as follows:
parseElements :: Parser [Element]
parseElements = parseElement `sepBy1` separator
  where
    separator = do
      hspace -- Optional horizontal (non-newline) space
      choice [void $ char ';', void $ some eol] -- Either a single ';' or many newlines
      hspace
    parseElement :: Parse Element
    parseElement = choice [parseLineElement, parseStart, parseCell]
      where
        parseStart = do
          void $ string "Start"
          space1
          (x, y) <- parseCoord
          return $ Start x y
        parseCell = do
          void $ string "Cell"
          space1
          (x, y) <- parseCoord
          space1
          symbol <- parseSymbol
          return $ Cell x y symbol
        parseCoord = do
          x <- decimal
          space1
          y <- decimal
          return (x, y)
Conclusion

Weâ€™ve presented how to parse simple constructs using megaparsec and how to run our parsers. This blog post is less than 1500 words long: mission accomplished presenting megaparsec in a shorter way than the official tutorial ðŸ¥³

If you want to use the code from this blog post as a starting point, feel free to clone https://github.com/tweag/minimal-megaparsec-tutorial. And once your project is moving away from a minimal viable product, head over to megaparsecâ€™s official tutorial to learn about more advanced ways to use megaparsec!

This is an instance of the monad transformer pattern.â†©
April 24, 2025 12:00 AM

Chris Penner

3 things other languages should steal from Unison

New languages are coming out all the time, some experimental, some industrial, others are purpose built for a specific domain. No single language has the people-power or scope to try every cool new feature, so a critical step in designing a new language is to observe how experimental features have borne themselves out in practice.

As the saying goes, good [language designers] copy, great [language designers] steal.

If you've heard anything about the Unison Language it's not a surprise to you that it innovates in many areas. Unison very much tries to reinvent Human-Compiler interactions for the 21st century, and in that pursuit has spawned fully integrated ecosystem between the compiler, codebase-manager, language server, version control and package manager.

While some of these features are still too new to have proven their worth (but we have our fingers crossed); there are aspects that I think new languages should certainly consider as part of their designs.

A Fully Interactive and Incremental Compiler

With the modern era of language servers and programming assistants, developers greatly benefit from instant feedback on their work. With traditional batch compilers it's all too tempting to go for a coffee, or a walk, or a YouTube binge every time you kick off a big build. The context-switching induced by switching tasks while compiling wastes developer time by paging things in and out of their working memory, not to mention: it just feels bad. After the build finishes, the developer is left with a giant wall of text, sentenced to dig through a large list of compiler errors trying to find some root-cause error in the file they're working on.

Unison has a fully interactive compilation experience. The language-server is typechecking your scratch-file on every keystroke providing error feedback right in your editor, and offering helpful information via hover-hints which use your codebase and typechecking info to help you orient yourself. It can even partially typecheck the file to suggest which types or operators you may want to fill into a given slot.

Once you're happy with a chunk of code, you can check it in to the codebase and it won't be compiled again unless you want to change it, or an update is automatically propagated into it from a downstream change.

While most languages won't adopt Unison's scratch-file and codebase model; having an interactive compiler with good support for caching of already-compiled-assets is a huge boon to productivity in any language.

On the topic of the language server, Unison's language server is built directly into the compiler. This ensures we avoid the awkward disagreements between the LSP and compiler that sometimes happen in other languages. It can also help to avoid duplicate work, many languages are running the compiler independently and in their LSP at the same time without sharing any of the work between them, causing redundant work and a waste of precious resources.

Codebase API

It's the compiler's job to understand your code intimately. It knows exactly how every definition is linked together, even if you don't! In many languages it can be frustrating to know that this information exists deep within the compiler, but not having any access to it yourself!

Unison stores all your code as structured data within your codebase and exposes the ability for you to ask it useful questions about your code, exposing that precious understanding to you as a developer.

Unison allows searching by type, finding the dependencies of a definition, or inverting that relationship to finding all definitions which depend on a definition.

Via the UCM CLI you can use utilities like text.find to search only string constants, or find to search only definition names.

Some codebase data is provided via an API which is exposed from the interactive UCM compiler, allowing developers to write tooling to customize their workflow. For example, check out this VS Code plugin someone wrote to view codebase definitions in the sidebar. In other languages you'd typically need to write a scrappy Regex or re-compile the code in a subprocess in order to achieve something similar.

It doesn't have to be an API, it could be a parquet file or a SQLite database or any number of things, the important part is that a language exposes its one-true-source of information about the codebase in some structured format for third-party tools to build upon.

Smart docs

It doesn't matter how great your language's package ecosystem is if nobody can figure out how to use it! Documentation is critical for helping end users understand and use functionality in your language, but it has a fatal flaw: documentation isn't compiled and falls out of date with the code.

In Unison, docs are a data-type within the language itself. This means that docs can be generated dynamically by running Unison code! We've leveraged this ability to enable embedding typechecked runnable code examples into your docs. These examples are compiled alongside the rest of your program, so they're guaranteed to be kept up to date, and the outputs from your example code is run and updated whenever the source definitions change.

You can also write code which generates documentation based on your real application code. For example, you could write code which crawls your web-server's implementation and collects all the routes and parameters the server defines and displays them nicely as documentation.

Unison goes one step further here by providing special support for the documentation format on Unison Share, ensuring any definitions mentioned in docs and code examples are hyper-linked to make for a seamless package-browsing experience.

As an example of how far this can go, check out this awesome project by community contributor Alvaro which generates mermaid graphs in the docs representing the behaviour of simulations. The graphs are generated from the same underlying library code so they won't go out of date.

Get stealing

This subset of topics doesn't touch on Unison's ability system, continuation capturing, or code serialization so I'll probably need at least a part 2!

Hopefully you learned something ðŸ¤ž! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! ðŸ�»

April 24, 2025 12:00 AM

April 23, 2025

Haskell Interlude

64: Sandy Maguire

In this episode Mike Sperber and Niki Vazou talk with Sandy Maguire, lead compiler engineer at Manifold Valley. They talk about the benefits of using Haskell of course, about all the books Sandy has written, on effects and the problem with monads, on combinator libraries and programming with laws.

by Haskell Podcast at April 23, 2025 10:00 PM

April 18, 2025

GHC Developer Blog

GHC 9.10.2-rc1 is now available

GHC 9.10.2-rc1 is now available

bgamari - 2025-04-18

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes. A full accounting of these fixes can be found in the release notes. As always, GHCâ€™s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 1 May 2025.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at April 18, 2025 12:00 AM

April 12, 2025

Derek Elkins

What difference lists actually are
Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Functional Lists

To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it is doing is a bit misunderstood and, consequently, oversold¹. Let’s see how we would model this approach in an OO language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it would be exactly the same for generic lists.
interface PrependTo {
  String prependTo(String end);
}

class Compose implements PrependTo {
  private PrependTo left;
  private PrependTo right;
  public Compose(PrependTo left, PrependTo right) {
    this.left = left; this.right = right;
  }
  String prependTo(String end) {
    this.left.prependTo(this.right.prependTo(end));
  }
}

class Prepend implements PrependTo {
  private String s;
  public Prepend(String s) { this.s = s; }
  String prependTo(String end) {
    return this.s + end;
  }
}
This is just a straight, manual implementation of closures for (.) and (++) (specialized to strings). Other lambdas not of the above two forms would lead to other implementations of PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an OOP encoding of the algebraic data type:
data PrependTo = Compose PrependTo PrependTo | Prepend String

prependTo :: PrependTo -> String -> String
prependTo (Compose left right) end = prependTo left (prependTo right end)
prependTo (Prepend s) end = s ++ end
We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo some very minor details (that could be resolved by using the Church-encoded version of this), this does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less obviously, applying these functional lists requires traversing this tree of closures – made into an explicit tree here. In fact, this reveals that this representation could require arbitrarily large amounts of work for a given size of output. This is due to the fact that prepending an empty string doesn’t increase the output size but still increases the size of the tree. In practice, it’s a safe assumption that, on average, at least one character will be prepended per leaf of the tree which makes the overhead proportional to the size of the output.

This tree representation is arguably better than the “functional list” representation. It’s less flexible for producers, but that’s arguably a good thing because we didn’t really want arbitrary String -> String functions. It’s more flexible for consumers. For example, getting the head of the list is a relatively efficient operation compared to applying a “functional list” and taking the head of the result even in an eager language. (Laziness makes both approaches comparably efficient.) Getting the last element is just the same for the tree version, but, even with laziness, is much worse for the functional version. More to the point, this concrete representation allows the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern match on whether a function is the identity function or not.

This view makes it very clear what the functional version is doing.

Difference Lists in Prolog

List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.
append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).
For our purposes, viewing this as a function of the first two arguments, this is exactly the usual functional implementation of list concatenation with exactly the same problems. We could, of course, encode the defunctionalized version of the functional approach into (pure) Prolog. This would produce:
prepend_to(compose(Xs, Ys), End, Zs) :- prepend_to(Ys, End, End2), prepend_to(Xs, End2, Zs).
prepend_to(prepend(Xs), End, Zs) :- append(Xs, End, Zs).
(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)

However, this being a logic programming language means we have additional tools available to use that functional languages lack. Namely, unification variables. For an imperative (destructive) implementation of list concatenation, the way we’d support efficient append of linked lists is we’d keep pointers to the start and end of the list. To append two lists, we’d simply use the end pointer of the first to update the end of the first list to point at the start of the second. We’d then return a pair consisting of the start pointer of the first and the end pointer of the second.

This is exactly how Prolog difference lists work, except instead of pointers, we use unification variables which are more principled. Concretely, we represent a list as a pair of lists, but the second list will be represented by an unbound unification variable and the first list contains that same unification variable as a suffix. This pair is often represented using the infix operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.

At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys is a list Xs which contains Ys as a suffix.

The name “difference list” is arguably motivated by the definition of concatenation in this representation.
concat(Xs - Ys, Ys - Zs, Xs - Zs).
This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument is unbound, like it’s supposed to be, then this is a constant-time operation of binding that component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys) where the length is up to either nil or an unbound variable tail².

We also have the unit of concat, i.e. the empty list via³:
empty(Xs - Xs).
See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and Ys.

We get back to a “normal” list via:
to_list(Xs - [], Xs).

% or more generally,
prepend_to(Xs - Ys, Ys, Xs).
to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component of the difference list. This means that the first input no longer meets our condition to be a difference list. In other words, to_list (and prepend_to) consumes the difference list. More precisely, it constrains the possible suffixes the list could be. Indeed, any operation that binds the suffix component of a difference list consumes it. For example, concat consumes its first argument.

Of course, it still makes logical sense to work with the difference list when its suffix component is bound, it’s just that its operational interpretation is different. More to the point, given a difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get two different results.

Converting from a list does require traversing the list since we need to replace the nil node, i.e. [], with a fresh unbound variable. Luckily, this is exactly what append does.
from_list(Xs, Ys - Zs) :- append(Xs, Zs, Ys).
from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more functionally and writing append as addition, we’d have:

\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]

If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal) list with to_list, and then we could use from_list multiple times on that result. This would, of course, still consume the original difference list. We’d also be paying O(length Xs) for every duplicate, including to replace the one we just consumed⁴.

That said, we can prepend to a list to a difference list without consuming it. We can perform other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing into the list would force the list to be at least a certain length, but still allow prepending to any list that will result in a final list at least that long.

Comparison

I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing enforces that a value of type ShowS actually just appends something to its input. We could use abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference lists also need an abstraction barrier to ensure their invariants, though their failure modes are different. A difference list can’t change what it is based on what it is prepended to.

Functional Representation Difference Lists

constant-time concatenation constant-time concatenation

constant-time conversion from a list (though you pay for it later) O(n) conversion from a list

persistent non-persistent, requires linear use

represented by a tree of closures represented by a pair of a list and a unification variable

O(n) (or worse!) conversion to a list constant-time conversion to a list

defunctionalized version can be implemented in pretty much any language requires at least single-assignment variables

unclear connection to being the difference of two lists (which two lists?) mathematical, if non-obvious, connection to being the difference of two (given) lists

As an illustration of the difference between persistent and non-persistent uses, the function:
double f = f . f
is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the other hand:
double(In, Out) :- concat(In, In, Out).
is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic list) except for when In is the empty difference list.

Conclusion

I hope I’ve illustrated that the functional representation is not just not difference lists, but is, in fact, wildly different from difference lists.

This functional representation is enshrined into Haskell via the ShowS type and related functions, but I’d argue the concrete tree representation is actually clearer and better. The functional representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS should have been an abstract type.

Difference lists are an interesting example of how imperative ideas can be incorporated into a declarative language. That said, difference lists come with some of the downsides of an imperative approach, namely the lack of persistence.

As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional representation. Calling it “functional lists” or something like that is, in my opinion, very ambiguous and potentially misleading. I think the lack of a good name for this is why “difference lists” started becoming popular. As I’ve argued, using “difference list” in this context is even more misleading and confusing.

If people really want a name, one option might be “delta list”. I don’t think this term is used. It keeps the intuitive idea that the functional representation represents some “change” to a list, a collection of deltas that will all be applied at once, but it doesn’t make any false reference to difference lists. I’m not super into this name; I just want something that isn’t “difference list” or otherwise misleading.

To be clear, it’s still much, much, better than using plain concatenation.↩︎

Such a length relation couldn’t be written in pure Prolog but can in actual Prolog.↩︎

For those algebraically minded, this almost makes concat and empty into another monoid except concat is partial, but such a partial monoid is just a category! In other words, we have a category whose objects are lists and whose homsets are, at most, singletons containing Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a suffix of Xs, this thin category is exactly the category corresponding to the reflexive, transitive “has suffix” relation. We could generalize this to any monoid via a “factors through” relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is exactly the slice category of the monoid viewed as a one-object category.↩︎

Again, in actual Prolog, we could make a duplicate without consuming the original, though it would still take O(length Xs) time using the notion of length mentioned before.↩︎
April 12, 2025 11:25 PM

Functional Representation	Difference Lists
constant-time concatenation	constant-time concatenation
constant-time conversion from a list (though you pay for it later)	O(n) conversion from a list
persistent	non-persistent, requires linear use
represented by a tree of closures	represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list	constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language	requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?)	mathematical, if non-obvious, connection to being the difference of two (given) lists

April 04, 2025

Edward Z. Yang

Why you should maintain a personal LLM coding benchmark

Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.

I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?

Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.

Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.

Nicholas Carlini's Yet Another Applied LLM Benchmark is an existence proof that this playbook can work. As Nicholas describes it:

It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.

There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.

I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:

I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!

After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.

Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)

For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)

The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)

In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:

Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.

Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!

I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.

One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)

I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.

For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.

I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.

To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.

by Edward Z. Yang at April 04, 2025 07:05 AM

April 01, 2025

in Code

Faking ADTs and GADTs in Languages That Shouldn't Have Them
Haskell is the world’s best programming language¹, but let’s face the harsh reality that a lot of times in life you’ll have to write in other programming languages. But alas you have been fully Haskell-brained and lost all ability to program unless it is type-directed, you don’t even know how to start writing a program without imagining its shape as a type first.

Well, fear not. The foundational theory behind Algebraic Data Types and Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that they’ll fit (somewhat) seamlessly into whatever language you’re forced to write. After all, if they can fit profunctor optics in Microsoft’s Java code, the sky’s the limit!

This is an “April Fools” joke in the tradition of my previous one in some of these ways that we are going to twist these other languages might seem unconventional or possibly ill-advised… but also the title is definitely a lie: these languages definitely should have them! :D

Normal ADTs

As a reminder, algebraic Data Types (ADTs) are products and sums; that’s why they’re algebraic, after all!

Product Types

Products are just immutable structs, which pretty much every language supports — as long as you’re able to make sure they are never mutated.

Structs in c, for example, look like:
#include <stdint.h>

typedef struct {
    uint32_t timestamp;
    double amount;
} Transaction;
But you’ll need proper immutable API for it:
Transaction createTransaction(uint32_t timestamp, double amount) {
    return (Transaction){ timestamp, amount};
}

uint32_t getTimestamp(const Transaction* t) {
    return t->timestamp;
}

double getAmount(const Transaction* t) {
    return t->amount;
}

Transaction setTimestamp(const Transaction* t, uint32_t timestamp) {
    return (Transaction){timestamp, t->amount};
}

Transaction setAmount(const Transaction* t, double amount) {
    return (Transaction){t->timestamp, amount};
}
This is much simpler in languages where you can associate functions with data, like OOP and classes. For example, this is the common “value object” pattern in java (roughly related to the java bean²):
public class Transaction {
    private final long timestamp;
    private final double amount;

    public Transaction(long timestamp, double amount) {
        this.timestamp = timestamp;
        this.amount = amount;
    }

    public long getTimestamp() { return timestamp; }
    public double getAmount() { return amount; }

    public Transaction setTimestamp(long newTimestamp) {
        return new Transaction(newTimestamp, this.amount);
    }

    public Transaction setAmount(double newAmount) {
        return new Transaction(this.timestamp, newAmount);
    }
}
And there you go. Nothing too surprising there!

In this case, not only are these ADTs (algebraic data types), they’re also ADTs (abstract data types): you are meant to work with them based on a pre-defined abstract interface based on type algebra, instead of their internal representations.

Sum Types

If your language doesn’t support sum types, usually the way to go is with the visitor pattern: the underlying implementation is hidden, and the only way to process a sum type value is by providing handlers for every branch — a pattern match as a function, essentially. Your sum values then basically determine which handler is called.

For example, we can implement it for a network address type that can either be IPv4 or IPv6. Here we are using C++ just for generics and lambdas with closures, for simplicity, but we’ll discuss how this might look in C later.
#include <iostream>
#include <format>
#include <cstdint>

struct IPAddress {
    bool isIPv4;
    union {
        uint32_t ipv4;
        uint8_t ipv6[16];
    };
};

template <typename R>
struct IPAddressVisitor {
    R (*visitIPv4)(uint32_t);
    R (*visitIPv6)(const uint8_t (&)[16]);
};

template <typename R>
R acceptIPAddress(const IPAddress& ip, IPAddressVisitor<R> visitor) {
    return ip.isIPv4 ? visitor.visitIPv4(ip.ipv4)
                     : visitor.visitIPv6(ip.ipv6);
}
You can create the values using:
IPAddress mkIPv4(uint32_t value) {
    return { true, { value } };
}

IPAddress mkIPv6(const uint8_t (&value)[16]) {
    IPAddress out = { false };
    std::copy(std::begin(value), std::end(value), out.ipv6);
    return out;
}
And we can show an address:
std::string showIPAddress(const IPAddress& ip) {
    IPAddressVisitor<std::string> visitor = {
        [](uint32_t v) {
            return std::format("{}.{}.{}.{}",
                               (v >> 24) & 0xFF, (v >> 16) & 0xFF,
                               (v >> 8) & 0xFF, v & 0xFF);
        },
        [](const uint8_t (&v)[16]) {
            return std::format("{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:"
                               "{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}",
                               v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7],
                               v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]);
        }
    };
    return acceptIPAddress(ip, visitor);
}
Note that in this way, the compiler enforces that we handle every branch. And, if we ever add a new branch, everything that ever consumes IPAddress with an IPAddressVisitor will have to add a new handler.

In a language without generics or powerful enough polymorphism, it’s difficult to enforce the “pure” visitor pattern because you can’t ensure that all branches return the same type.

One common pattern is to have an “effectful” visitor pattern, where the point isn’t to return something, but to execute something on the payload of the present branch. This is pretty effective for languages like C, javascript, python, etc. where types aren’t really a rigid thing.

For example, this might be how you treat an “implicit nullable”:
export const visitMaybe = (visitNothing, visitJust, val) =>
  (val == null) ? visitNothing() : visitJust(val);
This is basically for_ from Haskell: You can do something like conditionally launch some action if the value is present.
visitMaybe(
  () => console.log("Nothing to request"),
  (reqPayload) => makeRequest("google.com", reqPayload),
  maybeRequest
);
On a simpler note, if your language as subtyping built in (maybe with classes and subclasses) or some other form of dynamic dispatch, you can implement it in terms of that, which is nice in python, java, C++, etc.
interface ExprVisitor<R> {
    R visitLit(int value);
    R visitNegate(Expr unary);
    R visitAdd(Expr left, Expr right);
    R visitMul(Expr left, Expr right);
}

abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);
}
Alternatively, you’re in a language where lambdas are easy, instead of tupling up the visitor, you could just have accept itself take a number of arguments corresponding to each constructor:
// Alternative definition without an explicit Visitor class
abstract class Expr {
    public abstract <R> R accept(
        Function<int,R> visitLit,
        Function<Expr,R> visitNegate,
        BiFunction<Expr,Expr,R> visitAdd,
        BiFunction<Expr,Expr,R> visitMul
    );
}
(Note that C++ doesn’t allow template virtual methods — not because it’s not possible within the language semantics and syntax, but rather because the maintainers are too lazy to add it — so doing this faithfully requires a bit more creativity)

Now, if your language has dynamic dispatch or subclass polymorphism, you can actually do a different encoding, instead of the tagged union. This will work in languages that don’t allow or fully support naked union types, too. In this method, each constructor becomes a class, but it’s important to only allow access using accept to properly enforce the sum type pattern.
class Lit extends Expr {
    private final int value;

    public Lit(int value) {
        this.value = value;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitLit(value);
    }
}

class Negate extends Expr {
    private final Expr unary;

    public Negate(Expr unary) { this.unary = unary; }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitNegate(unary);
    }
}

class Add extends Expr {
    private final Expr left;
    private final Expr right;

    public Add(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitAdd(left, right);
    }
}

class Mul extends Expr {
    private final Expr left;
    private final Expr right;

    public Mul(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitMul(left, right);
    }
}
(But, just wanted to note that if you actually are working in java, you can actually do something with sealed classes, which allows exhaustiveness checking for its native switch/case statements.)

Alternatively you could make all of the subclasses anonymous and expose them as factory methods, if your language allows it:
abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitNegate(unary);
            }
        };
    }

    public static Expr add(Expr left, Expr right) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitAdd(left, right);
            }
        };
    }

    // ... etc
}
You’d then call using:
public class Main {
    public static void main(String[] args) {
        Expr expr = new Mul(new Negate(new Add(new Lit(4), new Lit(5))), new Lit(8));
        // or
        // Expr expr = Eval.mul(Eval.negate(Eval.add(Eval.lit(4), Eval.lit(5))), Eval.lit(8));

        ExprVisitor<Integer> eval = new ExprVisitor<>() {
            @Override public Integer visitLit(int value) {
                return value;
            }
            @Override public Integer visitNegate(Expr unary) {
                return -unary.accept(this);
            }
            @Override public Integer visitAdd(Expr left, Expr right) {
                return left.accept(this) + right.accept(this);
            }
            @Override public Integer visitMul(Expr left, Expr right) {
                return left.accept(this) * right.accept(this);
            }
        };

        System.out.println("Result: " + expr.accept(eval));
    }
}
Passing around function references like this is actually pretty close to the scott encoding of our data type — and for non-recursive types, it’s essentially the church encoding.

Recursive Types

Speaking of recursive types…what if your language doesn’t allow recursive data types? What if it doesn’t allow recursion at all, or what if recursively generated values are just annoying to deal with? Just imagine writing that Expr type in a language with explicit memory management, for example. Or, what if you wanted a way to express your recursive types in a more elegant and runtime-safe manner?

One thing you can instead do is have your visitor be in its “catamorphism”, or church encoding. Instead of having the “visitor” take the recursive sub-values, instead have it return the result of recursively applying itself.

Let’s do this in dhall, one of the most famous non-recursive languages. Dhall does have native sum types, so we won’t worry about manually writing a visitor pattern. But it does not have recursive data types.

Let’s define a type like:
data Expr = Lit Natural
          | Add Expr Expr
          | Mul Expr Expr
But we can’t define data types in dhall that refer to themselves. So instead, we can define them in their “church encoding”: give what you would do with an Expr to consume it, where the consumption function is given as if it were recursively applied.
let ExprF : Type -> Type
      = \(r : Type) ->
        { lit : Natural -> r
        , add    : r -> r -> r
        , mul    : r -> r -> r
        }

let Expr : Type
      = forall (r : Type) -> ExprF r -> r
Note that ExprF r is essentially ExprVisitor<R>, except instead of add being Expr -> Expr -> r, it’s r -> r -> r: the input values aren’t the expression, but rather the results of recursively folding on the expression. In fact, our original non-recursive ExprVisitor<R> (to be more precise, the R accept(ExprVisitor<R>)) is often called the “scott encoding”, as opposed to the recursive “church encoding” fold.

For value creation, you take the visitor and recursively apply:
let lit : Natural -> Expr
      = \(x : Natural) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.lit x

let add : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.add (left r handlers) (right r handlers)

let mul : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.mul (left r handlers) (right r handlers)
And finally, using the data type involves providing the handler to fold up from the bottom to top. Note that add : \(left : Natural) -> \(right : Natural) -> left + right already assumes that the handler has been applied to the sub-expressions, so you get Naturals on both sides instead of Expr.
let eval : Expr -> Natural
      = \(e : Expr) ->
          e Natural
            { lit = \(x : Natural) -> x
            , add = \(left : Natural) -> \(right : Natural) -> left + right
            , mul = \(left : Natural) -> \(right : Natural) -> left * right
            }

let testVal : Expr
      = mul (add (lit 4) (lit 5)) (lit 8)

in  assert : eval testVal === 72
This pattern is useful even in languages with good datatype recursion, like Haskell — it’s actually the recursion-schemes refactoring of a recursive data type, and it can be useful to have it live alongside your normal recursive types. I’ve written this blog post talking about how useful this pattern is to have alongside your normal recursive types.

This pattern is pretty portable to other languages too, as long as you can scrounge together something like Rank-N types:
interface ExprFold<R> {
    R foldLit(int value);
    R foldNegate(R unary);
    R foldAdd(R left, R right);
    R foldMul(R left, R right);
}

interface Expr {
    public abstract <R> R accept(ExprFold<R> fold);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldNegate(unary.accept(fold));
            }
        };
    }

    // etc.
}
By “Rank-N types” here, I mean that your objects can generate polymorphic functions: given an Expr, you could generate an <R> R accept(ExprFold <R> fold) for any R, and not something pre-determined or pre-chosen by your choice of representation of Expr.

Generalized Algebraic Data Types

You’ve implemented ADTs in your language of choice, or you are currently in a language with native ADTs. Life is good, right? Until that sneaky voice starts whispering in your hear: “we need more type safety.” You resist that urge, maybe even get a lot done without it, but eventually you are compelled to give in and embrace the warm yet harsh embrace of ultimate type safety. Now what?

Singletons and Witnesses

In Haskell, singletons are essentially enums used to associate a value with a reifiable type. “Reifiable” here means that you can take the runtime value of a singleton and use it to bring evidence to the type-level. I ran into a real-world usage of this while writing https://coronavirus.jle.im/, a web-based data visualizer of COVID-19 data (source here) in purescript. I needed a singleton to represent scales for scatter plots and linking them to the data that can be plotted. And, not only did it need to be type-safe in purescript (which has ADTs but not GADTs), it had to be type-safe in the javascript ffi as well.

Here’s how it might look in Haskell:
-- | Numeric types
data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

-- | Define a scale
data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> NType a -> Scale a   -- ^ whether to include zero in the axis or not
    ScaleLog :: NType a -> Scale a
You’d then run it like this:
plot :: Scale a -> Scale b -> [(a, b)] -> Canvas
So, we have the type of the input tuples being determined by the values you pass to plot:
ghci> :t plot ScaleDate (ScaleLinear True (LNumeric NInt))
[(Date, Int)] -> Canvas
But let’s say we only had ADTs. And then we’re passing them down to a javascript FFI which only has structs and functions. We could drop the type-safety and instead error on runtime, but…no. Type unsafety is not acceptable.

The fundamental ability we want to gain is that if we pattern match on ScaleDate, then we know a has to be Date. If we match on NInt, we know that a has to be Int.

For the sake of this example, we’re going to be implementing a simpler function in purescript and in javascript: a function that takes a scale type and a list of points prints the bounds. In Haskell, this looks like:
data AxisBounds a = AB
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayAxis :: Scale a -> [a] -> AxisBounds a
displayAxis = \case
    ScaleDate -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (showDate xMin) xMax (showDate xMax)
    ScaleLinear hasZero nt -> \xs ->
      displayNumericAxis (if hasZero then 0:xs else xs)
    ScaleLog nt ->
      displayNumericAxis nt xs

displayNumericAxis :: NType a -> [a] -> AxisBounds a
displayNumericAxis = \case
    NInt -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%d" xMin) xMax (printf "%d" xMax)
    NDouble -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.4f" xMin) xMax (printf "%.4f" xMax)
    NPercent -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.1f%%" (xMin*100)) xMax (printf "%.1f%%" (xMax*100))
(Pretend the Percent type is just a newtype-wrapped Float or something)

There are at least two main approaches to do this. We’ll be discussing runtime equality witnesses and Higher-Kinded Eliminators.

Runtime Witnesses and Coyoneda Embedding

The Yoneda Lemma is one of the most powerful tools that Category Theory has yielded as a branch of math, but its sibling coyoneda is one of the most useful Haskell abstractions.

This doesn’t give you GADTs, but it’s a very lightweight way to “downgrade” your GADTs into normal GADTs which is appropriate if you don’t need the full power.

The trick is this: if you have MyGADT a, and you know you are going to be using it to produce as, you can do a covariant coyoneda transform.

For example, if you have this type representing potential data sources:
data Source :: Type -> Type where
    ByteSource :: Handle -> Source Word
    StringSource :: FilePath -> Source String

readByte :: Handle -> IO Word
readString :: FilePath -> IO String

readSource :: Source a -> IO a
readSource = \case
    ByteSource h -> readByte h
    StringSource fp -> readString fp
You could instead turn Source into a non-GADT by making it a normal parameterized ADT and adding a X -> a field, which is a type of CPS transformation:
data Source a =
    ByteSource Handle (Word -> a)
  | StringSource FilePath (String -> a)

byteSource :: Handle -> Source Word
byteSource h = ByteSource h id

stringSource :: FilePath -> Source String
stringSource fp = StringSource fp id

readSource :: Source a -> IO a
readSource = \case
    ByteSource h out -> out <$> readByte h
    StringSource fp out -> out <$> readString fp
A nice benefit of this method is that Source can now have a Functor instance, which the original GADT could not.

And, if MyGADT a is going to be consuming as, you can do the contravariant coyoneda transform:
data Sink a =
    ByteSink Handle (a -> Word)
  | StringSink FilePath (a -> String)
This gives it a free Contravariant instance too!

And, if you are going to be both consuming and producing as, you can do the invariant coyoneda transform
data Interface a =
    ByteInterface Handle (Word -> a) (a -> Word)
  | StringInterface FilePath (String -> a) (Word -> a)
However, in practice, true equality involves being able to lift under injective type constructors, and carrying every single continuation is unwieldy. We can package them up together with a runtime equality witness.

This is something we can put “inside” NInt such that, when we pattern match on a NType a, the type system can be assured that a is an Int.

You need some sort of data of type IsEq a b with functions:

refl :: IsEq a a

to :: IsEq a b -> a -> b

sym :: IsEq a b -> IsEq b a

trans :: IsEq a b -> IsEq b c -> IsEq a c

inj :: IsEq (f a) (f b) -> IsEq a b

If you have to and sym you also get from :: IsEq a b -> b -> a.

From all of this, we can recover our original IsEq a Word -> Word -> a and IsEq a Word -> a -> Word functions, saving us from having to put two functions.

Your language of choice might already have this IsEq. But one of the more interesting ways to me is Leibniz equality (discussed a lot in this Ryan Scott post), which works in languages with higher-kinded polymorphism. Leibniz quality in languages with higher-kinded polymorphism means that a and b are equal if forall p. p a -> p b: any property of a is also true of b.

In Haskell, we write this like:
newtype Leibniz a b = Leibniz (forall p. p a -> p b)

refl :: Leibniz a a
refl = Leibniz id
The only possible way to construct a ‘Leibniz’ is with both type parameters being the same: You can only ever create a value of type Leibniz a a, never a value of Leibniz a b where b is not a.

You can prove that this is actually equality by writing functions Leibniz a b -> Leibniz b a and Leibniz a b -> Leibniz b c -> Leibniz a c (this Ryan Scott post goes over it well), but in practice we realize this equality by safely coercing a and b back and forth:
newtype Identity a = Identity { runIdentity :: a }

to :: Leibniz a b -> a -> b
to (Leibniz f) = runIdentity . f . Identity

newtype Op a b = Op { getOp :: b -> a }

from :: Leibniz a b -> b -> a
from (Leibniz f) = getOp (f (Op id))
So, if your language supports higher-kinded Rank-2 types, you have a solution!

There are other solutions in other languages, but they will usually all be language-dependent.

Let’s write everything in purescript. The key difference is we use map (to isNumber) :: Array a -> Array Number, etc., to get our Array as something we know it has the type of.
import Text.Printf

newtype Leibniz a b = Leibniz (forall p. p a -> p b)

to :: Leibniz a b -> a -> b
from :: Leibniz a b -> b -> a

data NType a =
    NInt (Leibniz a Int)
  | NNumber (Leibniz a Number)
  | NPercent (Leibniz a Percent)

type AxisBounds a =
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayNumericAxis :: NType a -> Array a -> AxisBounds a
displayNumericAxis = \case
    NInt isInt -> \xs ->
      let xMin = minimum $ map (to isInt) xs
          xMax = maximum $ map (to isInt) xs
          showInt = show
       in { minValue: xMin
          , minLabel: showInt xMin
          , maxValue: xMax
          , maxLabel: showInt xMax
          }
    NNumber isNumber -> \xs ->
      let xMin = minimum $ map (to isNumber) xs
          xMax = maximum $ map (to isNumber) xs
          showFloat = printf (Proxy :: Proxy "%.4f")   -- it works a little differently
       in { minValue: xMin
          , minLabel: showFloat xMin
          , maxValue: xMax
          , maxLabel: showFloat xMax
          }
    NPercent isPercent -> \xs ->
      let xMin = minimum $ map (to isPercent) xs
          xMax = maximum $ map (to isPercent) xs
          showPercent = printf (Proxy :: Proxy "%.1f%%") <<< (_ * 100.0)
       in { minValue: xMin
          , minLabel: showPercent xMin
          , maxValue: xMax
          , maxLabel: showPercent xMax
          }
To work with our [a] as if it were [Int], we have to map the coercion function over it that our Leibniz a Int gave us. Admittedly, this naive way adds a runtime cost of copying the array. But we could be more creative with finding the minimum and maximum in this way in constant space and no extra allocations.

And, if we wanted to outsource this to the javascript FFI, remember that javascript doesn’t quite have sum types, so we can create a quick visitor:
type NVisitor a r =
    { nvInt :: Leibniz a Int -> r
    , nvNumber :: Leibniz a Number -> r
    , nvPercent :: Leibniz a Percent -> r
    }

type NAccept a = forall r. NVisitor a r -> r

toAccept :: NType a -> NAccept a
toAccept = case _ of
    NInt isInt -> \nv -> nv.nvInt isInt
    NNumber isNumber -> \nv -> nv.nvNumber isNumber
    NPercent isPercent -> \nv -> nv.nvPercent isPercent

foreign import _formatNumeric :: forall a. Fn2 (NAccept a) a String

formatNumeric :: NType a -> a -> String
formatNumeric nt = runFn2 _formatNumeric (toAccept nt)
The FFI binding looks like: (taken from my actual source code)
import * as d3 from "d3-format";

export const _formatNumeric = (naccept, xs) =>
  naccept(
    { nvInt: (isInt) => d3.format("~s")
    , nvNumber: (isNumber) => d3.format(".3~s")
    , nvPercent: (isPercent) => d3.format("+.3~p")
    }
  );
Admittedly in the javascript we are throwing away the “GADT type safety” because we throw away the equality. But we take what we can — we at least retain the visitor pattern for sum-type type safety and exhaustiveness checking. I haven’t done this in typescript yet so there might be a way to formalize Leibniz equality to do this in typescript and keep the whole chain type-safe from top to bottom.

Higher-Kinded Eliminators

This is essentially the higher-kinded version of the visitor pattern, except in dependent type theory these visitors are more often called “eliminators” or destructors, which is definitely a cooler name.

In the normal visitor you’d have:
data User = TheAdmin | Member Int

data UserHandler r = UH
    { uhTheAdmin :: r
    , uhMember :: Int -> r
    }
But note that if you have the right set of continuations, you have something that is essentially equal to User without having to actually use User:
type User' = forall r. UserHandler r -> r

fromUser :: User -> User'
fromUser = \case
    TheAdmin -> \UH{..} -> uhTheAdmin
    Member userId -> \UH{..} -> uhMember userId

toUser :: User' -> Foo
toUser f = f $ UH { fhTheAdmin = TheAdmin, fhMember = Member }
This means that User is actually equivalent to forall r. UserHandler r -> r: they’re the same type, so if your language doesn’t have sum types, you could encode it as forall r. UserHandler r -> r instead. Visitors, baby.

But, then, what actually does the r type variable represent here, semantically? Well, in a UserHandler r, r is the “target” that we interpret into. But there’s a deeper relationship between r and User: A UserHandler r essentially “embeds” a User into an r. And, a UserHandler r -> r is the application of that embedding to an actual User.

If we pick r ~ (), then UserHandler () embeds User into (). If we pick r ~ String, then UserHandler () embeds User into String (like, “showing” it). And if we pick r ~ User, a UserHandler User embeds a User into…itself?

So here, r is essentially the projection that we view the user through. And by making sure we are forall r. UserHandler r -> r for all r, we ensure that we do not lose any information: the embedding is completely 1-to-1. It lets you “create” the User faithfully in a “polymorphic” way.

In fact, to hammer this home, some people like to use the name of the type as the type variable: UserHandler user:
-- | The same thing as before but with things renamed to prove a point
data MakeUser user = MakeUser
    { uhTheAdmin :: user
    , uhMember :: Int -> user
    }

type User' = forall user. MakeUser user -> user
The forall user. lets us faithfully “create” a User within the system we have, without actually having a User data type. Essentially we can imagine the r in the forall r as “standing in” for User, even if that type doesn’t actually exist.

Now, here’s the breakthrough: If we can use forall (r :: Type) to substitute for User :: Type, how about we use a forall (p :: Type -> Type) to substitute for a Scale :: Type -> Type?
data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> LType a -> Scale a
    ScaleLog :: NType a -> Scale a

data ScaleHandler p a = SH
    { shDate :: p Date
    , shLinear :: Bool -> NType a -> p a
    , shLog :: NType a -> p a
    }

type Scale' a = forall p. ScaleHandler p a -> p a

fromScale :: Scale a -> Scale' a
fromScale = \case
    ScaleDate -> \SH{..} -> shDate
    ScaleLinear hasZero lt -> \SH{..} -> shLinear hasZero lt
    ScaleLog nt -> \SH{..} -> shLog nt

toScale :: Scale' a -> Scale a
toScale f = f $ SH { shDate = ScaleDate, shLinear = ScaleLinear, shLog = ScaleLog }
So in our new system, forall p. ScaleHandler p a -> p a is identical to Scale: we can use p a to substitute in Scale in our language even if our language itself cannot support GADTs.

So let’s write formatNType in purescript. We no longer have an actual Scale sum type, but its higher-kinded church encoding:
type NType a = forall p.
    { int :: p Int
    , number :: p Number
    , percent :: p Percent
    } -> p a

type Scale a = forall p.
    { date :: p Date
    , linear :: Bool -> NType a -> p a
    , log :: NType a -> p a
    } -> p a

ntInt :: NType Int
ntInt nth = nth.int

ntNumber :: NType Number
ntNumber nth = nth.number

ntPercent :: NType Percent
ntPercent nth = nth.percent

formatNType :: NType a -> a -> String
formatNType nt = f
  where
    Op f = nt
      { int: Op show
      , number: Op $ printf (Proxy "%.4f")
      , percent: Op $ printf (Proxy "%.1f%%") <<< (_ * 100.0)
      }
Here we are using
newtype Op b a = Op (a -> b)
as our “target”: turning an NType a into an Op String a. And an Op String a is an a -> String, which is what we wanted! The int field is Op String Int, the number field is Op String Number, etc.

In many languages, using this technique effectively requires having a newtype wrapper on-hand, so it might be unwieldy in non-trivial situations. For example, if we wanted to write our previous axis function which is NType a -> [a] -> String, we’d have to have a newtype wrapper for [a] -> String that has a as its argument:
newtype OpList b a = Op ([a] -> b)
or you could re-use Compose:
newtype Compose f g a = Compose (f (g a))
and your p projection type would be Compose Op []. So, you don’t necessarily have to write a bespoke newtype wrapper, but you do have to devote some brain cycles to think it through (unless you’re in a language that doesn’t need newtype wrappers to have this work, like we’ll discuss later).

By the way, this method generalizes well to multiple arguments: if you have a type like MyGADT a b c, you just need to project into a forall (p :: k1 -> k2 -> k3 -> Type).

I believe I have read somewhere that the two methods discussed here (runtime equality witness vs. higher-kinded eliminator) are not actually fully identical in their power, and there are GADTs where one would work and not the other … but I can’t remember where I read this and I’m also not big-brained enough to figure out what those situations are. But if you, reader, have any idea, please let me know!

Existential Types

Let’s take a quick break to talk about something that’s not technically related to GADTs but is often used alongside them.

What if we wanted to store a value with its NType and hide the type variable? In Haskell we’d write this like:
data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

data SomeNType = forall a. SomeNType (NType a) a

formatNType :: NType a -> a -> String
formatNType nt x = ...

formatSomeNType :: SomeNType -> String
formatSomeNType (SomeNType nt x) = formatNType nt x

myFavoriteNumbers :: [SomeNType]
myFavoriteNumbers = [SomeNType NInt 3, SomeNType NDouble pi]
But what if our language doesn’t have existentials? Remember, this is basically a value SomeNType that isn’t a Generic, but contains both a NType a and an a of the same variable.

One strategy we have available is to CPS-transform our existentials into their CPS form (continuation-passing style form). Basically, we write exactly what we want to do with our contents if we pattern matched on them. It’s essentially a Rank-N visitor pattern with only a single constructor:
type SomeNType = forall r. (forall a. NType a -> a -> r) -> r

someNType :: NType a -> a -> SomeNType
someNType nt x = \f -> f nt x

formatSomeNumeric :: SomeNType -> String
formatSomeNumeric snt = snt
    \nt x -> formatNumeric nt x
You can imagine, syntactically, that snt acts as its “own” pattern match, except instead of matching on SomeNType nt x -> .., you “match” on \nt x -> ..

This general pattern works for languages with traditional generics like Java too:
interface SomeNTypeVisitor<R> {
    <A> R visit(NType<A> nt, A val);
}

interface SomeNType {
    public abstract <R> R accept(SomeNTypeVisitor<R> visitor);

    // One option: the factory method
    public static <A> SomeNType someNType(NType<A> nt, A val) {
        return new SomeNType() {
            @Override
            public <R> R accept(SomeNTypeVisitor<R> visitor) {
                return visitor.visit(nt, val);
            }
        };
    }
}

// Second option: the subtype hiding a type variable, which you have to always
// make sure to upcast into `SomeNType` after creating
class SomeNTypeImpl<A> extends SomeNType {
    private NType<A> nt;
    private A val;

    public SomeNTypeImpl(NType<A> nt, A val) {
        this.nt = nt;
        this.val = val;
    }

    @Override
    public <R> R accept(SomeNTypeVisitor<R> visitor) {
        return visitor.visit(nt, val);
    }
}
Does…anyone write java like this? I tried committing this once while at Google and I got automatically flagged to be put on a PIP.

Recursive GADTs

The climax of this discussion: what if your language does not support GADTs or recursive data types?

We’re going to be using dhall as an example again, but note that the lessons applied here are potentially useful even when you do have recursive types: we’re going to be talking about a higher-kinded church encoding, which can be a useful form of your data types that live alongside your normal recursive ones.

Let’s imagine Expr as a GADT, where Expr a represents an Expr that evaluates to an a:
data Expr :: Type -> Type where
    NatLit :: Natural -> Expr Natural
    BoolLit :: Bool -> Expr Bool
    Add :: Expr Natural -> Expr Natural -> Expr Natural
    LTE :: Expr Natural -> Expr Natural -> Expr Bool
    Ternary :: Expr Bool -> Expr a -> Expr a -> Expr a

eval :: Expr a -> a
eval = \case
    NatLit n -> n
    BoolLit b -> b
    Add x y -> eval x + eval y
    LTE a b -> eval a <= eval b
    Ternary b x y -> if eval b then eval x else eval y
Adding this type variable ensures that our Expr is type-safe: it’s impossible to Add an Expr Bool, and the two branches of a Ternary must have the same result type, etc. And, we can write eval :: Expr a -> a and know exactly what type will be returned.

Now, let’s combine the two concepts: First, the church encoding, where our handlers take the “final result” of our fold r instead of the recursive value Expr. Second, the higher-kinded eliminator pattern where we embed Expr :: Type -> Type into forall (p :: Type -> Type).

And finally, we get:³
let ExprF =
      \(p : Type -> Type) ->
        { natLit : Natural -> p Natural
        , boolLit : Bool -> p Bool
        , add : p Natural -> p Natural -> p Natural
        , ternary : forall (a : Type) -> p Bool -> p a -> p a -> p a
        }

let Expr
    : Type -> Type
    = \(a : Type) -> forall (p : Type -> Type) -> ExprF p -> p a

let eval
    : forall (a : Type) -> Expr a -> a
    = \(a : Type) ->
      \(e : Expr a) ->
        e
          (\(q : Type) -> q)
          { natLit = \(x : Natural) -> x
          , boolLit = \(x : Bool) -> x
          , add = \(x : Natural) -> \(y : Natural) -> x + y
          , ternary =
              \(a : Type) ->
              \(b : Bool) ->
              \(x : a) ->
              \(y : a) ->
                if b then x else y
          }
Again, now instead of add taking Expr, it takes p Natural: the “Natural result of the fold”. p not only stands in for what we embed Expr into, it stands in for the result of the recursive fold. That’s why in eval, the first arguments of add are the Natural results of the sub-evaluation.

These values can be created in the same way as before, merging the two techniques, sending the handlers downstream:
let natLit
    : Natural -> Expr Natural
    = \(n : Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.natLit n

let boolLit
    : Bool -> Expr Bool
    = \(n : Bool) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.boolLit n

let add
    : Expr Natural -> Expr Natural -> Expr Natural
    = \(x : Expr Natural) ->
      \(y : Expr Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.add (x p handlers) (y p handlers)

let ternary
    : forall (a : Type) -> Expr Bool -> Expr a -> Expr a -> Expr a
    = \(a : Type) ->
      \(b : Expr Bool) ->
      \(x : Expr a) ->
      \(y : Expr a) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.ternary (b p handlers) (x p handlers) (y p handlers)

let testVal
    : Expr Natural
    = add (natLit 5) (add (natLit 6) (natLit 7))

in  assert : eval testVal === 18
If all of this is difficult to parse, try reviewing both the recursive ADT section and the higher-kinded eliminator section and making sure you understand both well before tackling this, which combines them together!

Admittedly in Haskell (and purescript) this is a lot simpler because we don’t have to explicitly pass in type variables:
data ExprF p = ExprF
    { natLit :: Natural -> p Natural
    , boolLit :: Bool -> p Bool
    , add :: p Natural -> p Natural -> p Natural
    , ternary :: forall a.  p Bool -> p a -> p a -> p a
    }

type Expr a = forall p. ExprF p a -> p a

eval :: Expr a -> a
eval e = runIdentity $
  e
    { natLit = Identity
    , boolLit = Identity
    , add = \(Identity x) -> \(Identity y) -> Identity (x + y)
    , ternary = \(Identity b) -> \(Identity x) -> \(Identity y) -> if b then x else y
    }

ternary :: Expr Bool -> Expr a -> Expr a -> Expr a
ternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)
But one nice thing about the dhall version that’s incidental to dhall is that it doesn’t require any extra newtype wrappers like the Haskell one does. That’s because type inference tends to choke on things like this, but dhall doesn’t really have any type inference: all of the types are passed explicitly. It’s one of the facts about dhall that make it nice for things like this.

Congratulations

In any case, if you’ve made it this far, congratulations! You are a master of ADTs and GADTs. Admittedly every language is different, and some of these solutions have to be tweaked for the language in question. And, if your program gets very complicated, there is a good chance that things will become ergonomically unfeasible.

But I hope, at least, that this inspires your imagination to try to bring your haskell principles, techniques, standards, practices, and brainrot into the language of your choice (or language you are forced to work with).

And, if you ever find interesting ways to bring these things into a language not discussed here (or a new interesting technique or pattern), I would absolutely love to hear about it!

Until next time, happy “Haskelling”!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)

I bet you thought there was going be some sort of caveat in this footnote, didn’t you?↩︎

I didn’t think I’d ever write “java bean” non-ironically on my blog, but there’s a first time for everything.↩︎

Be aware that this implementation is not necessarily appropriately lazy or short-circuiting in Ternary: it might evaluate both sides returning the chosen branch.↩︎
by Justin Le at April 01, 2025 05:29 PM

March 30, 2025

Lysxia's blog

Unfolding trees breadth-first in Haskell
To visit a tree or graph in breadth-first order, there are two main implementation approaches: queue-based or level-based. Our goal here is to develop a level-based approach where the levels of the breadth-first walk are constructed compositionally and dynamically.

Compositionality means that for every node, its descendants—the other nodes reachable from it—are defined by composing the descendants of its children. Dynamism means that the children of a node are generated only when that node is visited; we will see that this requirement corresponds to asking for a monadic unfold.

A prior solution, using the Phases applicative functor, is compositional but not dynamic in that sense. The essence of Phases is a zipping operation in free applicative functors. What if we did zipping in free monads instead?

This is a Literate Haskell post. The source code is on Gitlab. A reusable version of this code is now available on Hackage: the weave library.
Extensions and imports for this Literate Haskell file
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ViewPatterns #-}
{-# OPTIONS_GHC -Wno-x-partial -Wno-unused-matches -Wno-unused-top-binds -Wno-unused-imports #-}

import "deepseq" Control.DeepSeq (NFData)
import Data.Foldable (toList)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.Functor.Identity (Identity(..), runIdentity)
import GHC.Generics (Generic)
import "tasty" Test.Tasty (TestTree, localOption)
import "tasty-hunit" Test.Tasty.HUnit ((@?=), testCase)
import "tasty-bench" Test.Tasty.Bench (bgroup, bench, defaultMain, nf, bcompare)
-- import "tasty-bench" Test.Tasty.Bench (mutatorCpuTime)
import "tasty-expected-failure" Test.Tasty.ExpectedFailure (expectFail)
import "some" Data.Some.Newtype (Some(Some))
import "transformers" Control.Monad.Trans.State
import qualified "containers" Data.Set as Set
import "containers" Data.Set (Set)
Background: breadth-first folds and traversals

Our running example will be the type of binary trees:
data Tree a = Leaf | Node a (Tree a) (Tree a)
  deriving (Eq, Show, Generic, NFData)
A breadth-first walk explores the tree level by level; every level contains the nodes at the same distance from the root. The list of levels of a tree can be defined recursively—it is a fold. For a tree Node x l r, the first level contains just the root node x, and the subsequent levels are obtained by appending the levels of the subtrees l and r pairwise.
levels :: Tree a -> [[a]]
levels Leaf = []
levels (Node x l r) = [x] : zipLevels (levels l) (levels r)
zipLevels :: [[a]] -> [[a]] -> [[a]]
zipLevels [] yss = yss
zipLevels xss [] = xss
zipLevels (xs : xss) (ys : yss) = (xs ++ ys) : zipLevels xss yss
(We can’t just use zipWith because it throws away the end of a list when the other list is empty.)

Finally, we concatenate the levels together to obtain the list of nodes in breadth-first order.
toListBF :: Tree a -> [a]
toListBF = concat . levels
Thanks to laziness, the list will indeed be produced by walking the tree in breadth-first order. So far so good.

The above function lets us fold a tree in breadth-first order. The next level of difficulty is to traverse a tree, producing a tree with the same shape as the original tree, only with modified labels.
traverseBF :: Applicative m => (a -> m b) -> Tree a -> m (Tree b)
This has the exact same type as traverse, which you might obtain with deriving (Foldable, Traversable). The stock-derived Traversable—enabled by the DeriveTraversable extension—is a depth-first traversal, but the laws of traverse don’t specify the order in which nodes should be visited, so you could make it a breadth-first traversal if you wanted.

To define a breadth-first traversal is a surprisingly non-trivial exercise, as pointed out by Chris Okasaki in Breadth-first numbering: lessons from a small exercise in algorithm design (ICFP 2000).

“Breadth-first numbering” is a special case of “breadth-first traversal” where the arrow (a -> m b) is specialized to a counter. Okasaki presents a “numbering” solution based on queues and another solution based on levels. Both are easily adaptable to the more general “traversal” problem as we will soon see.

There is a wonderful Discourse thread from 2024 on the topic of breadth-first traversals. The first post gives an elegant breadth-first numbering algorithm which also appears in the appendix of Okasaki’s paper, but sadly it does not generalize from “numbering” to “traversal” beyond the special case m = State s.

Last but not least, another level-based solution to the breadth-first traversal problem can be found in the tree-traversals library by Noah Easterly. It is built around an applicative transformer named Phases, which is a list of actions—imagine the type “[m _]”—where each element m _ represents one level of the tree. The Phases applicative enables a compositional definition of a breadth-first traversal, similarly to the levels function above: the set of nodes reachable from the root is defined by combining the sets of nodes reachable from its children. This concern of compositionality is one of the main motivations behind this post.

Non-standard terminology

The broad family of algorithms being discussed is typically called “breadth-first search” (BFS) or “breadth-first traversal”, but in general these algorithms are not “searching” for anything, and in Haskell, “traversal” is reserved for “things like traverse”. Instead, this post will use “walks” as a term encompassing folds, traversals, unfolds, or any concept that can be qualified with “breadth-first”.

Problem statement: Breadth-first unfolds

Both the fold toListBF and the traversal traverseBF had in common that they receive a tree as an input. This explicit tree makes the notion of levels “static”. With unfolds, we will have to deal with levels that exist only “dynamically” as the result of unfolding the tree progressively.

To introduce the unfolding of a tree, it is convenient to introduce its “base functor”. We modify the tree type by replacing the recursive tree fields with an extra type parameter:
data TreeF a t = LeafF | NodeF a t t
  deriving (Functor, Foldable, Traversable)
An unfold generates a tree from a seed and a function which expands the seed into a leaf or a node containing more seeds. A pure unfold—or anamorphism—can be defined readily:
unfold :: (s -> TreeF a s) -> s -> Tree a
unfold f s = case f s of
  LeafF -> Leaf
  NodeF a l r -> Node a (unfold f l) (unfold f r)
The order in which nodes are evaluated depends on how the resulting tree is consumed. Hence unfold is neither inherently “depth-first” nor “breadth-first”.

The situation changes if we make the unfold monadic.
unfoldM :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
An implementation of unfoldM must decide upon an ordering between actions. To see why adding an M to unfold imposes an ordering, contemplate the fact that these expressions have the same meaning:
Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
    let tr = unfold f r in
    Node a tl tr )
= ( let tr = unfold f r in
    let tl = unfold f l in
    Node a tl tr )
whereas these monadic expressions do not have the same meaning in general:
( unfoldM f l >>= \tl ->
  unfoldM f r >>= \tr ->
  pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
  unfoldM f l >>= \tl ->
  pure (Node a tl tr) )
Without further requirements, there is an “obvious” definition of unfoldM, which is a depth-first unfold:
unfoldM_DF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_DF f s = f s >>= \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (unfoldM_DF f l) (unfoldM_DF f r)
We unfold the left subtree l fully before unfolding the right one r.

The problem is to define a breadth-first unfoldM.

If you want to think about this problem on your own, you can stop reading here. The rest of this post presents solutions.

Queue-based unfold

The two breadth-first numbering algorithms in Okasaki’s paper can actually be generalized to breadth-first unfolds. Here is the first one that uses queues (using the function (<+) for “push” and pattern-matching on (:>) for “pop”):
unfoldM_BF_Q :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_Q f b0 = go (b0 <+ Empty) <&> \case
    _ :> t -> t
    _ -> error "impossible"
  where
    go Empty = pure Empty
    go (q :> b) = f b >>= \case
      LeafF -> go q <&> \p -> Leaf <+ p
      NodeF a b1 b2 -> go (b2 <+ b1 <+ q) <&> \case
        p :> t1 :> t2 -> Node a t1 t2 <+ p
        _ -> error "impossible"
(The operator (<&>) is flip (<$>). I use it to avoid parentheses around lambdas.)
Queue implementation for unfoldM_BF_Q
data Q a = Q [a] [a]

pattern Empty :: Q a
pattern Empty = Q [] []

infixr 1 <+
(<+) :: a -> Q a -> Q a
x <+ Q xs ys = Q (x : xs) ys

pop :: Q a -> Maybe (Q a, a)
pop (Q xs (y : ys)) = Just (Q xs ys, y)
pop (Q xs []) = case reverse xs of
  [] -> Nothing
  y : ys -> Just (Q [] ys, y)

infixl 1 :>
pattern (:>) :: Q a -> a -> Q a
pattern q :> y <- (pop -> Just (q, y))

{-# COMPLETE Empty, (:>) #-}
As it happens, containers uses that queue-based technique to implement breadth-first unfold for rose trees (Data.Tree.unfoldTreeM_BF). There is a pending question of whether we can improve upon it. This post might provide a theoretical alternative, but it seems too slow to be worth serious consideration (see the benchmark section).

If you’re frowning upon the use of error—as you should be—you can replace error with dummy values here (Empty, Leaf), but (1) that won’t be possible with tree structures that must be non-empty (e.g., if Leaf contained a value) and (2) this is dead code, which is harmless but no more elegant than making it obvious with error.

The correctness of this solution is also not quite obvious. There are subtle ways to get this implementation wrong: should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q? Should the pattern be p :> t1 :> t2 or p :> t2 :> t1? For another version of this challenge, try implementing the unfold for another tree type, such as finger trees or rose trees, without getting lost in the order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in containers). The invariant is not complex but there is room for mistakes. I believe that the compositional approach that will be presented later is more robust on that front, although it is admittedly a subjective quality for which is difficult to make a strong case.

Some uses of unfolds

Traversals from unfolds

One sense in which unfoldM is a more difficult problem than traverse is that we can use unfoldM to implement traverse. We do have to make light of the technicality that there is a Monad constraint instead of Applicative, which makes unfoldM not suited to implement the Traversable class.

A depth-first unfold gives a depth-first traversal:
traverse_DF :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_DF = unfoldM_DF . traverseRoot

-- auxiliary function
traverseRoot :: Applicative m => (a -> m b) -> Tree a -> m (TreeF b (Tree a))
traverseRoot _ Leaf = pure LeafF
traverseRoot f (Node a l r) = f a <&> \b -> NodeF b l r
A breadth-first unfold gives a breadth-first traversal:
traverse_BF_Q :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_BF_Q = unfoldM_BF_Q . traverseRoot
Unfolds in graphs

We can use a tree unfold to explore a graph. This usage distinguishes unfolds from folds and traversals, which only let you explore trees.

Given a type of vertices V, a directed graph is represented by a function V -> F V, where F is a functor which describes the arity of each node. The obvious choice for F is lists, but we will stick to TreeF here so we can just reuse this post’s unfoldM implementations. The TreeF functor restricts us graphs where each node has zero or two outgoing edges; it is a weird restriction, but we will make do for the sake of example.
An ASCII drawing of a graph
        +-------+
        v       |
+->1--->2--->3  |
|  |    |    ^  |
|  v    v    |  |
|  4--->5--->6--+
|  |    |    ^
|  +----|----+
|       |
+-------+
The graph drawn above turns into the following function, where every vertex is mapped either to NodeF with the same vertex as the first argument followed by its two adjacent vertices, or to LeafF if it has no outgoing edges or does not belong to the graph.
graph :: Int -> TreeF Int Int
graph 1 = NodeF 1 2 4
graph 2 = NodeF 2 3 5
graph 3 = LeafF
graph 4 = NodeF 4 5 6
graph 5 = NodeF 5 1 6
graph 6 = NodeF 6 2 3
graph _ = LeafF
If we simply feed that function to unfold, we will get the infinite tree of all possible paths from a chosen starting vertex.

To obtain a finite tree, we want to keep track of vertices that we have already visited, using a stateful memory. The following function wraps graph, returning LeafF also if a vertex has already been visited.
visitGraph :: Int -> State (Set Int) (TreeF Int Int)
visitGraph vertex = do
  visited <- get
  if vertex `elem` visited then pure LeafF
  else do
    put (Set.insert vertex visited)
    pure (graph vertex)
Applying unfoldM_BF to that function produces a “breadth-first tree” of the graph, an encoding of the trajectory of a breadth-first walk through the graph. “Breadth-first trees” are a concept from graph theory with well-studied properties.
-- Visit `graph` in breadth-first order
bfGraph_Q :: Int -> Tree Int
bfGraph_Q = (`evalState` Set.empty) . unfoldM_BF_Q visitGraph
testGraphQ :: TestTree
testGraphQ = testCase "Q-graph" $
  bfGraph_Q 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Compile and run

This post is a compilable Literate Haskell file. You can run all of the tests and benchmarks in here. The source repository provides the necessary configuration to build it with cabal.
$ cabal build breadth-first-unfolds
Test cases can then be selected with the -p option and a pattern (see the tasty documentation for details). Run all tests and benchmarks by passing no option.
$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
  Q-graph: OK
  S-graph: OK
“Global” level-based unfold

The other solution from Okasaki’s paper can also be adapted into a monadic unfold.

The starting point is to unfold a list of seeds [s] instead of a single seed: we can traverse the list with the expansion function s -> m (TreeF a s) to obtain another list of seeds, the next level of the breadth-first unfold, and keep going.

Iterating this process naively yields a variant of monadic unfold without a result. This no-result variant can be generalized from TreeF to any foldable structure:
-- Inner loop: multi-seed unfold
unfoldsM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> [s] -> m ()
unfoldsM_BF_G_ f [] = pure ()
-- Read from right to left: traverse, flatten, recurse.
unfoldsM_BF_G_ f xs = unfoldsM_BF_G_ f . concatMap toList =<< traverse f xs

-- Top-level function: single-seed unfold
unfoldM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> s -> m ()
unfoldM_BF_G_ f = unfoldsM_BF_G_ f . (: [])
Modifying this solution to create the output tree requires a little more thought. We must keep hold of the intermediate list of ts :: [TreeF a s] to reconstruct trees after the recursive call returns.
unfoldsM_BF_G :: Monad m => (s -> m (TreeF a s)) -> [s] -> m [Tree a]
unfoldsM_BF_G f [] = pure []
-- traverse, flatten, recurse, reconstruct
unfoldsM_BF_G f xs = traverse f xs >>= \ts ->
  reconstruct ts <$> unfoldsM_BF_G f (concatMap toList ts)
The reconstruction function picks a root in the first list and completes it with subtrees from the second list:
reconstruct :: [TreeF a s] -> [Tree a] -> [Tree a]
reconstruct (LeafF : ts) us = Leaf : reconstruct ts us
reconstruct (NodeF a _ _ : ts) (l : r : us) = Node a l r : reconstruct ts us
reconstruct _ _ = error "impossible"
You could modify the final branch to produce [], but error makes it explicit that this branch should never be reached by the unfold where it is used.

The top-level unfold function wraps the seed in a singleton input list and extracts the root from a singleton output list.
unfoldM_BF_G :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_G f = fmap head . unfoldsM_BF_G f . (: [])
Unit test testGraphG
bfGraph_G :: Int -> Tree Int
bfGraph_G = (`evalState` Set.empty) . unfoldM_BF_G visitGraph
testGraphG :: TestTree
testGraphG = testCase "Q-graph" $
  bfGraph_G 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
This solution is less brittle than the queue-based solution because we always traverse lists left-to-right. To avoid the uses of error in reconstruct, you can probably create a specialized data structure in place of [TreeF a s], but that is finicky in its own way.

In search of compositionality

Both of the solutions above (the queue-based and the “monolithic” level-based unfolds) stem from a global view of breadth-first walks: we are iterating on a list or a queue which holds all the seeds from one or two levels at a time. That structure represents a “front line” between visited and unvisited vertices, and every iteration advances the front line a little: with a queue we advance it one vertex at a time, with a list we advance the whole front line in an inner loop—one call to traverse—before recursing.

The opposite local view of breadth-first order is exemplified by the earlier levels function: it only produces a list of lists of the vertices reachable from the current root. It does so recursively, by composing together the vertices reachable from its children. Our goal here is to find a similarly local, compositional implementation of breadth-first unfolds.

Rather than defining unfoldM directly, which sequences the computations on all levels into a single computation, we will introduce an intermediate function weave that keeps levels separate—just as toListBF is defined using levels. The result of weave will be in an as yet unknown applicative functor F m depending on m. And because levels are kept separate, weave only needs a constraint Applicative m to compose computations on the same level. The goal is to implement this signature, where the result type F is also an unknown:
weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
The name weave comes from visualizing a breadth-first walk as a path zigzagging across a tree like this:

which is reminiscent of weaving as in the making of textile:

Illustration from the Wikipedia article on Weaving: warp and weft.

With only what we know so far, a bit of type-directed programming leads to the following incomplete definition. We have constructed something of type m (F m (Tree a)), while we expect F m (Tree a):
weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = _ (step <$> f s) where
  step :: TreeF a s -> F m (Tree a)
  step LeafF = pure Leaf
  step (NodeF a l r) = liftA2 NodeF (weave f l) (weave f r)
To fill the hole _, we postulate the following primitive, weft, as part of the unknown definition of F:
weft :: Applicative m => m (F m a) -> F m a
Intuitively, F m represents “multi-level computations”. The weft function constructs a multi-level (F m)-computation from one level of m-computation which returns the subsequent levels as an (F m)-computation.

We fill the hole with weft, completing the definition of weave:
weave :: forall m s a. Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = weft (weaveF <$> f s) where
  weaveF :: TreeF a s -> F m (Tree a)
  weaveF LeafF = pure Leaf
  weaveF (NodeF a l r) = liftA2 (Node a) (weave f l) (weave f r)
The function weave defines a multi-level computation which represents a breadth-first walk from a seed s:

the first level of the walk is f s, expanding the initial seed;

the auxiliary function weaveF constructs the remaining levels from the initial seed’s expansion:

if the seed expands to LeafF, there are no more seeds, and we terminate with an empty computation (pure);

if the seed expands to NodeF, we obtain two sub-seeds l and r, they generate their own weaves recursively (weave f l and weave f r), and we compose them (liftA2).

One way to think about weft is as a generalization of the following primitives: we can “embed” m-computations into F m, and we can “delay” multi-level (F m)-computations, shifting the m-computation on each level to the next level.
embed :: Applicative m => m a -> F m a
embed u = weft (pure <$> u)

delay :: Applicative m => F m a -> F m a
delay u = weft (pure u)
The key law relating these two operations is that embedded computations and delayed computations commute with each other:
embed u *> delay v = delay v <* embed u
The embed and delay operations are provided by the Phases applicative functor that I mentioned earlier, which enables breadth-first traversals, but not breadth-first unfolds. Thus, weft is a strictly more expressive primitive than embed and delay.

Eventually, we will run a multi-level computation as a single m-computation so that we can use weave to define unfoldM. The runner function will be called mesh:
mesh :: Monad m => F m a -> m a
It is characterized by this law which says that mesh executes the first level of the computation u :: m (F m a), then executes the remaining levels recursively:
mesh (weft u) = u >>= mesh
Putting everything together, weave and mesh combine into a breadth-first unfold:
unfoldM_BF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF f s = mesh (weave f s)
It remains to find an applicative functor F equipped with weft and mesh.

The weave applicative

A basic approach to design a type is to make some of the operations it should support into constructors. The weave applicative WeaveS has constructors for pure and weft:
data WeaveS m a
  = EndS a
  | WeftS (m (WeaveS m a))
(The suffix “S” stands for Spoilers. Read on!)

We instantiate the unknown functor F with WeaveS.
type F = WeaveS
Astute readers will have recognized WeaveS as the free monad. Just as Phases has the same type definition as the free applicative functor but a different Applicative instance, we will give WeaveS an Applicative instance that does not coincide with the Applicative and Monad instances of the free monad.

Starting with the easy functions, weft is WeftS, and the equation for mesh above is basically its definition. We just need to add an equation for EndS.
weft :: m (WeaveS m a) -> WeaveS m a
weft = WeftS

mesh :: Monad m => WeaveS m a -> m a
mesh (EndS a) = pure a
mesh (WeftS u) = u >>= mesh
Recall that WeaveS represents multi-level computations. Computations are composed level-wise with the following liftS2. The interesting case is the one where both arguments are WeftS: we compose the first level with liftA2, and the subsequent ones with liftS2 recursively.
liftS2 :: Applicative m => (a -> b -> c) -> WeaveS m a -> WeaveS m b -> WeaveS m c
liftS2 f (EndS a) wb = f a <$> wb
liftS2 f wa (EndS b) = flip f b <$> wa
liftS2 f (WeftS wa) (WeftS wb) = WeftS ((liftA2 . liftS2) f wa wb)
liftS2 will be the liftA2 in WeaveS’s Applicative instance. The Functor and Applicative instances show that WeaveS is an applicative transformer: for every applicative functor m, WeaveS m is also an applicative functor.
instance Functor m => Functor (WeaveS m) where
  fmap f (EndS a) = EndS (f a)
  fmap f (WeftS wa) = WeftS ((fmap . fmap) f wa)

instance Applicative m => Applicative (WeaveS m) where
  pure = EndS
  liftA2 = liftS2
That completes the definition of unfoldM_BF: a level-based, compositional breadth-first unfold.

As a unit test, we copy the code for visiting a graph from earlier:
bfGraphS :: Int -> Tree Int
bfGraphS = (`evalState` Set.empty) . unfoldM_BF visitGraph
testGraphS :: TestTree
testGraphS = testCase "S-graph" $
  bfGraphS 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Code golf

There is a variant of weave that I prefer:
weaveS :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveS m (Tree a))
weaveS f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (weft (weaveS f l)) (weft (weaveS f r))
The outer weft constructor was moved into the recursive calls. The result type has an extra m, which makes it more apparent that we always start with a call to f. It’s the same vibe as replacing the type [a] with NonEmpty a when we know that a list will always have at least one element; weaveS always produces at least one level of computation. We also replace (<$>) with its flipped version (<&>) for aesthetic reasons: we can apply it to a lambda without parentheses, and that change makes the logic flow naturally from left to right: we first expand the seed s using f, and continue depending on whether the expansion produced LeafF or NodeF.

To define unfoldM, instead of applying mesh directly, we chain it with (>>=).
unfoldM_BF_S :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_S f s = weaveS f s >>= mesh
A wrinkle in time

That solution is Obviously Correct™, but it has a terrible flaw: it does not run in linear time!

We can demonstrate this by generating a “thin” tree whose height is equal to its size. The height h is the seed of the unfolding, and we generate a NodeF as long as it is non-zero, asking for a decreased height h - 1 on the right, and a zero height on the left.
thinTreeS :: Int -> Tree ()
thinTreeS = runIdentity . unfoldM_BF_S f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))
Compare the running times of evaluating thinTreeS at height 100 (the baseline) and at height 1000 (10x the baseline).
benchS :: TestTree
benchS = bgroup "S-thin"
  [ bench  "1x" (nf thinTreeS 100)
  , bench "10x" (nf thinTreeS 1000) & bcompare "S-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 105x
Raw output
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
  S-thin
    1x:  OK
      27.6 μs ± 2.6 μs, 267 KB allocated, 317 B  copied, 6.0 MB peak memory
    10x: OK
      2.90 ms ± 181 μs,  23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x
Multiplying the height by 10x makes the function run 100x slower. Dramatically quadratic.

Complexity analysis

We can compare this implementation with level from earlier, which is linear-time. In particular, looking at zipLevels with liftS2—which play similar roles—there is a crucial difference when one of the arguments is empty ([] or EndS): zipLevels simply returns the other argument, whereas liftS2 calls (<$>), continuing the recursion down the other argument. So zipLevels stops working after reaching the end of either argument, whereas liftS2 walks to the end of both arguments. There is at least one call to liftS2 on every level which will walk to the bottom of the tree, so we get a quadratic lower bound Ω(height²).

Out of sight, out of mind

The problematic combinators are fmap and liftS2, which weaveS uses to construct the unfolded tree. If we don’t care about that tree—wanting only the effect of a monadic unfold—then we can get rid of the complexity associated with those combinators.

With no result to return, we remove the a type parameter from the definition of WeaveS, yielding the oblivious (“O”) variant:
data WeaveO m
  = EndO
  | WeftO (m (WeaveO m))
We rewrite mesh into meshO, reducing a WeaveO m computation into m () instead of m a.
meshO :: Monad m => WeaveO m -> m ()
meshO EndO = pure ()
meshO (WeftO u) = u >>= meshO
The Applicative instance for WeaveS becomes a Monoid instance for WeaveO. liftA2 is replaced with (<>), zipping two computations level-wise.
instance Applicative m => Semigroup (WeaveO m) where
  EndO <> v = v
  u <> EndO = u
  WeftO u <> WeftO v = WeftO (liftA2 (<>) u v)

instance Applicative m => Monoid (WeaveO m) where
  mempty = EndO
  mappend = (<>)
To implement a breadth-first walk, we modify weaveS above by replacing liftA2 (Node a) with (<>). Note that the type parameter a is no longer in the result. It was only used in the tree that we decided to forget.
weaveO :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveO m)
weaveO f s = f s <&> \case
  LeafF -> mempty
  NodeF _ l r -> WeftO (weaveO f l) <> WeftO (weaveO f r)
Running weaveO with meshO yields a oblivious monadic unfold: it produces m () instead of m (Tree a). (You may remember seeing another implementation of that same signature just earlier, unfoldM_BF_G_.)
unfoldM_BF_O_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_O_ f s = weaveO f s >>= meshO
Previously, we benchmarked the function thinTreeS that outputs a tree by forcing the tree. With an oblivious unfold, there is no tree to force. Instead we will count the number of generated NodeF constructors:
thinTreeO :: Int -> Int
thinTreeO = (`execState` 0) . unfoldM_BF_O_ (state . f)
  where
    f 0 counter = (LeafF, counter)
    f h counter = (NodeF () 0 (h - 1), counter + 1)  -- increment the counter for every NodeF
We adapt the benchmark from before to measure the complexity of unfolding thin trees. We have to increase the baseline height from 100 to 500 because this benchmark runs so much faster than the previous ones.
benchO :: TestTree
benchO = bgroup "O-thin"
  [ bench  "1x" (nf thinTreeO 500)
  , bench "10x" (nf thinTreeO 5000) & bcompare "O-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 9.8x
Raw output
$ cabal exec breadth-first-unfolds -- -p O-thin
All
  O-thin
    1x:  OK
      148  μs ± 8.3 μs, 543 KB allocated, 773 B  copied, 6.0 MB peak memory
    10x: OK
      1.45 ms ± 113 μs, 5.4 MB allocated,  82 KB copied, 7.0 MB peak memory, 9.78x
The growth is linear, as desired: the “10x” bench is 10x slower than the baseline “1x” bench.

Laziness for the win

The oblivious unfold avoided quadratic explosion by simplifying the problem. Now let’s solve the original problem again, so we can’t just get rid of fmap and liftA2. As mentioned previously, the root cause was that (1) liftA2 calls fmap when one of the constructors is EndS, and (2) fmap traverses the other argument. The next solution will be to make fmap take constant time, by storing the “mapped function” in the constructor. Behold the “L” variant of WeaveS, which is a GADT:
data WeaveL m a where
  EndL :: a -> WeaveL m a
  WeftL :: m (WeaveL m b) -> (b -> a) -> WeaveL m a
For comparison, here is the previous “S” variant with GADT syntax:
data WeaveS m a where
  EndS :: a -> WeaveS m a
  WeftS :: m (WeaveS m a) -> WeaveS m a
This trick is also known as the “co-Yoneda construction”.

The definition of fmap is no longer recursive. It doesn’t even need m to be a functor anymore!
instance Functor (WeaveL m) where
  fmap f (EndL a) = EndL (f a)
  fmap f (WeftL wa g) = WeftL wa (f . g)
The Applicative instance is… a good exercise for the reader. The details are not immediately important—we only care about improving fmap for now—we will come back to have a look at the Applicative instance soon.

The runner function meshL is a simple bit of type Tetris.
meshL :: Monad m => WeaveL m a -> m a
meshL (EndL a) = pure a
meshL (WeftL wa f) = f <$> (wa >>= meshL)
By partially applying WeftL to id as its second argument, we obtain a counterpart to the unary WeftS constructor:
weftL :: m (WeaveL m a) -> WeaveL m a
weftL wa = WeftL wa id
With those primitives redefined, the “weave” and “unfold” are identical. Below, we only renamed the “S” suffixes to “L”:
weaveL :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveL m (Tree a))
weaveL f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftL (weaveL f s1)) (weftL (weaveL f s2))

unfoldM_BF_L :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_L f s = weaveL f s >>= meshL
The benchmarks show that 10x the height takes 10x the time. Linear growth again.
Benchmark code and output: thinTreeL and benchL
Copy of the benchS benchmark.
thinTreeL :: Int -> Tree ()
thinTreeL = runIdentity . unfoldM_BF_L f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchL :: TestTree
benchL = bgroup "L-thin"
  [ bench  "1x" (nf thinTreeL  100)
  , bench "10x" (nf thinTreeL 1000) & bcompare "L-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 9.93x

Raw output:
$ cabal exec breadth-first-unfolds -- -p "L-thin"     
All
  L-thin
    1x:  OK
      14.1 μs ± 782 ns,  59 KB allocated,   5 B  copied, 6.0 MB peak memory
    10x: OK
      140  μs ±  13 μs, 586 KB allocated,  51 B  copied, 6.0 MB peak memory, 9.93x
Lazy in more ways than one

As hinted by the “L” and “S” suffixes, WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones” work by accumulating functions in the WeftL constructor. That work is “forced” by meshL, which is where the fmap ((<$>)) of the underlying monad m is called, performing the work accumulated by possibly many calls to WeaveL’s fmap.

One subtlety is that there are multiple “lazinesses” at play. The main benefit of using WeaveL is really to delay computation, that is a kind of laziness, but WeaveL doesn’t need to be implemented in a lazy language. We can rewrite all of the code we’ve seen so far in a strict language with minor changes, and we will still observe the quadratic vs linear behavior of WeaveS vs WeaveL on thin trees. The “manufactured laziness” of WeaveL is a concept independent of the “ambient laziness” in Haskell.

Nevertheless, we can still find an interesting role for that “ambient laziness” in this story. Indeed, the function weaveL also happens to be lazier than weaveS in the usual sense.

A concrete test case is worth a thousand words. Consider the following tree generator which keeps unfolding left subtrees while making every right subtree undefined:
partialTreeF :: Bool -> TreeF () Bool
partialTreeF True = NodeF () True False
partialTreeF False = undefined
If we used the pure unfold, we would get the same tree as this recursive definition:
partialTree :: Tree ()
partialTree = Node () partialTree undefined
What happens if we use one of the monadic unfolds? For example unfoldM_BF_S:
partialTreeS :: Tree ()
partialTreeS = runIdentity (unfoldM_BF_S (Identity . partialTreeF) True)
Try to force the first Node constructor.
whnfTreeS :: TestTree
whnfTreeS = expectFail $ testCase "S-whnf" $ do
  case partialTreeS of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable" -- definitely not a Leaf
As it turns out, this test using the “S” variant fails. (That’s why the test is marked with expectFail.) Forcing partialTreeS evaluates the undefined in partialTreeF. Therefore partialTreeS is not equivalent to partialTree.
$ cabal exec breadth-first-unfolds -- -p "S-whnf"
All
  S-whnf: FAIL (expected)
    Exception: Prelude.undefined
    CallStack ...
In contrast, the “L” variant makes that same test succeed.
partialTreeL :: Tree ()
partialTreeL = runIdentity (unfoldM_BF_L (Identity . partialTreeF) True)

whnfTreeL :: TestTree
whnfTreeL = testCase "L-whnf" $ do
  case partialTreeL of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
  L-whnf: OK
This difference can only be seen with “lazy monads”, where (>>=) is lazy in its first argument. (If this definition sounds not quite right, that’s probably because of seq. It makes a precise definition of “lazy monad” more complicated.) Examples of lazy monads from the transformers library are Identity, Reader, lazy State, lazy Writer, and Accum.

The secret sauce is the definition of liftA2 for WeaveL:
instance Applicative m => Applicative (WeaveL m) where
  pure = EndL
  liftA2 f (EndL a) wb = f a <$> wb
  liftA2 f wa (EndL b) = flip f b <$> wa
  liftA2 f (WeftL wa g) (WeftL wb h)
    = WeftL ((liftA2 . liftA2) (,) wa wb) (\ ~(a, b) -> f (g a) (h b))
In the third clause of liftA2, we put the function f in a lambda with a lazy pattern (~(a, b)) directly under the topmost constructor WeftL. Thus, we can access the result of f from the second field of WeftL without looking at the first field. In liftS2 earlier, f was passed as an argument to (liftA2 . liftS2), that forces us to run the computation before we can get a hold on the result of f.

Maximizing laziness

The “L” variant of unfoldM is lazier than the “S” variant, but there is still a gap between partialTreeL and the pure partialTree: if we force not only the root, but also the left subtree of partialTreeL, then we run into undefined again.
forceLeftTreeL :: TestTree
forceLeftTreeL = expectFail $ testCase "L-left" $ do
  case partialTreeL of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "L-left" 
All
  L-left: FAIL (expected)
    Exception: Prelude.undefined
Although the unfold using WeaveL is lazier than using WeaveS, it is not yet as lazy as it could be. The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function. The expansion function partialTreeF produces a level with an undefined sub-computation, which crashes the whole level. Each level in a computation will be either completely defined or undefined.

To recap, we’ve been looking at the following trees:
partialTreeS = undefined
partialTreeL = Node () undefined undefined
partialTree  = Node () partialTree undefined
It is natural to ask: can we define a breadth-first unfold that, when applied to partialTreeF, will yield the same tree as partialTree?

More generally, the new problem is to define a breadth-first unfoldM whose specialization with the Identity functor is equivalent to the pure unfold even on partially-defined values. That is, it satisfies the following equation:
unfold f = runIdentity . unfoldM (Identity . f)
Laziness without end

The strictness of liftA2 is caused by WeaveL having two constructors. Let’s get rid of EndL.
data WeaveE m a where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeaveE m a
Having only one constructor lets us use lazy patterns:
instance Functor (WeaveE m) where
  fmap f ~(WeftE wa g) = WeftE wa (f . g)
Wait a second. I spoke too fast, GHC gives us an error:
error: [GHC-87005]
    • An existential or GADT data constructor cannot be used
        inside a lazy (~) pattern
    • In the pattern: WeftE wa g
      In the pattern: ~(WeftE wa g)
      In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |
641 | >   fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |              ^^^^^^^^^^
The feature we need is “first-class existentials”, for which there is an open GHC proposal.

Not letting that stop us, there is a simple version of first-class existentials available in the package some, as the module Data.Some.Newtype (internally using unsafeCoerce). That will be sufficient for our purposes. All we need is an abstract type Some and a pattern synonym:
-- imported from Data.Some.Newtype
data Some f
pattern Some :: f a -> Some f
And we’re back on track. Here comes the actual “E” (endless) variant:
newtype WeaveE m a = MkWeaveE (Some (WeavingE m a))

data WeavingE m a b where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeavingE m a b
I spare you the details.
Functor, Applicative, weftE, meshE
instance Functor (WeaveE m) where
  fmap f (MkWeaveE (Some ~(WeftE u g))) = MkWeaveE (Some (WeftE u (f . g)))

instance Applicative m => Applicative (WeaveE m) where
  pure x = MkWeaveE (Some (WeftE (pure (pure ())) (\_ -> x)))
  liftA2 f (MkWeaveE (Some ~(WeftE u g))) (MkWeaveE (Some ~(WeftE v h)))
    = MkWeaveE (Some (WeftE ((liftA2 . liftA2) (,) u v) (\ ~(x, y) -> f (g x) (h y))))

weftE :: m (WeaveE m a) -> WeaveE m a
weftE u = MkWeaveE (Some (WeftE u id))

meshE :: Monad m => WeaveE m a -> m a
meshE (MkWeaveE (Some (WeftE u f))) = f <$> (u >>= meshE)
Breadth-first unfold, “E” variant: weaveE and unfoldM_BF_E
weaveE :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveE m (Tree a))
weaveE f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftE (weaveE f s1)) (weftE (weaveE f s2))

unfoldM_BF_E :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_E f s = weaveE f s >>= meshE
The endless WeaveE enables an even lazier implementation of unfoldM. When specialized to the identity monad, it lets us force the resulting tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).
partialTreeE :: Tree ()
partialTreeE = runIdentity (unfoldM_BF_E (Identity . partialTreeF) True)

forceLeftTreeE :: TestTree
forceLeftTreeE = testCase "E-left" $ do
  case partialTreeE of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "E-left"
All
  E-left: OK
One can also check that forcing the left spine of partialTreeE arbitrarily deep throws no errors.

We made it lazy, but at what cost? First, this “Endless” variant only works for lazy monads. With a strict monad, the runner meshE will loop forever. It is possible to run things more incrementally by pattern-matching on WeaveE, but you’re better off using the oblivious WeaveO anyway.

Second, when you aren’t running into an unproductive loop, the “Endless” variant of unfoldM has quadratic time complexity Ω(height²). The reason is essentially the same as the “Strict” variant: liftA2 keeps looping even if one argument is a pure weave—before, that was to traverse the other non-pure argument, now, there isn’t even a way to tell when the computation has ended. Thus, every leaf may create work proportional to the height of the tree.

Running the same benchmark as before, we measure even more baffling timings:

height time

baseline 1x

10x 738x
Benchmark: thinTreeE and benchE
thinTreeE :: Int -> Tree ()
thinTreeE = runIdentity . unfoldM_BF_E f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchE :: TestTree
benchE = {- localOption mutatorCpuTime $ -} bgroup "E-thin"
  [ bench "1x" (nf thinTreeE 100)
  , bench "10x" (nf thinTreeE 1000) & bcompare "E-thin.1x"
  ]
Raw output:
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      243  μs ±  22 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      179  ms ±  17 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 737.76x
Using the previous setup comparing a baseline and a 10x run, we see a more than 700x slowdown, so much worse than the 100x predicted by a quadratic model. Interestingly, the raw output shows that the total cumulative allocations did grow by a 100x factor.¹

But it gets weirder with more data points: it does not follow a clear power law. If Time(n) grew as n^c for some fixed exponent c, then the ratio Time(Mn)/Time(n) would be M^c, a constant that does not depend on n.

In the following benchmark, we keep doubling the height (M = 2) for every test case, and we measure the time relative to the preceding case each time. A quadratic model predicts a 4x slowdown at every step. Instead, we observe wildly varying factors.

Benchmark output (each time factor is relative to the preceding line, for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):

height time

1x

2x 10.9x

4x 9.5x

8x 5.4x

16x 1.4x
Code and raw output
benchE' :: TestTree
benchE' = {- localOption mutatorCpuTime $ -} bgroup "E-thin-more"
  [ bench "1x" (nf thinTreeE 100)
  , bench "2x" (nf thinTreeE 200) & bcompare "E-thin-more.1x"
  , bench "4x" (nf thinTreeE 400) & bcompare "E-thin-more.2x"
  , bench "8x" (nf thinTreeE 800) & bcompare "E-thin-more.4x"
  , bench "16x" (nf thinTreeE 1000) & bcompare "E-thin-more.8x"
  ]
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      222  μs ± 9.3 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    2x:  OK
      2.43 ms ±  85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
    4x:  OK
      23.1 ms ± 1.2 ms,  19 MB allocated, 2.7 MB copied,  10 MB peak memory, 9.53x
    8x:  OK
      126  ms ± 7.8 ms,  76 MB allocated,  18 MB copied,  24 MB peak memory, 5.44x
    16x: OK
      181  ms ± 7.0 ms, 119 MB allocated,  30 MB copied,  24 MB peak memory, 1.44x
I believe this benchmark is triggering some pathological behavior in the garbage collector. I modified tasty-bench with an option to measure CPU time without GC (mutator time). At time of writing, tasty-bench is still waiting for a new release. We can point Cabal to an unreleased commit of tasty-bench by adding the following lines to cabal.project.local.
source-repository-package
    type: git
    location: https://github.com/Bodigrim/tasty-bench.git
    tag: 81ff742a3db1d514461377729e00a74e5a9ac1b8
Then, uncomment the setting “localOption mutatorCpuTime $” in benchE and benchE' above and uncomment the import of mutatorCpuTime at the top.

Benchmark output (excluding GC time, relative):

height time

baseline 1x

1x 95x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      216  μs ±  18 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      20.5 ms ± 1.9 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 94.91x
For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is still a noticeable gap. I’m going to chalk the rest to inherent measurement errors (the cost of tasty-bench’s simplicity) exacerbated by the pathological GC behavior; a possible explanation is that the pattern of memory usage becomes so bad that it affects non-GC time.

Benchmark output (excluding GC time, each measurement is relative to the preceding line):

height time

1x

2x 3.2x

4x 4.2x

8x 4.5x

16x 1.7x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      186  μs ±  16 μs, 1.2 MB allocated,  13 KB copied,  21 MB peak memory
    2x:  OK
      597  μs ±  28 μs, 4.8 MB allocated, 236 KB copied,  21 MB peak memory, 3.20x
    4x:  OK
      2.48 ms ± 148 μs,  19 MB allocated, 2.9 MB copied,  21 MB peak memory, 4.15x
    8x:  OK
      11.2 ms ± 986 μs,  76 MB allocated,  18 MB copied,  24 MB peak memory, 4.50x
    16x: OK
      18.4 ms ± 1.7 ms, 119 MB allocated,  29 MB copied,  24 MB peak memory, 1.65x
It doesn’t seem possible for a breadth-first unfold to be both maximally lazy and of linear time complexity, but I don’t know how to formally prove that impossibility either.

Microbenchmarks: Queues vs Global Levels vs Weaves

So far we’ve focused on asymptotics (linear vs quadratic). Some readers will inevitably wonder about real speed. Among the linear-time algorithms—queues (“Q”), global levels (“G”), and weaves (lazy “L” or oblivious “O”)—which one is faster?

tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!), but keep in mind that these are all quite naive implementations.

There are two categories to measure separately: unfolds which produce trees, and oblivious unfolds—which don’t produce trees. These microbenchmarks construct full trees up to a chosen number of nodes. When there is an output tree, we force it (using nf), otherwise we force a counter of the number of nodes. We run on different sufficiently large sizes (500 and 5000) to check the stability of the measured factors, ensuring that we are only comparing the time components that dominate at scale.

The tables list times relative to the queue benchmark for each tree size.

Tree-producing unfolds

algorithm size time

Queue 500 1x

Global Levels 500 1.4x

Lazy Weave 500 3.1x

Queue 5000 1x

Global Levels 5000 1.2x

Lazy Weave 5000 3.3x
Code and raw output
fullTreeF :: Int -> Int -> TreeF Int Int
fullTreeF size n | n >= size = LeafF
fullTreeF size n = NodeF n (2 * n) (2 * n + 1)

fullTree_Q :: Int -> Tree Int
fullTree_Q size = runIdentity (unfoldM_BF_Q (Identity . fullTreeF size) 1)

fullTree_G :: Int -> Tree Int
fullTree_G size = runIdentity (unfoldM_BF_G (Identity . fullTreeF size) 1)

fullTree_L :: Int -> Tree Int
fullTree_L size = runIdentity (unfoldM_BF_L (Identity . fullTreeF size) 1)

fullTree :: TestTree
fullTree = bgroup "fullTree"
  [ bench "Q-1x" (nf fullTree_Q 500)
  , bench "G-1x" (nf fullTree_G 500) & bcompare "fullTree.Q-1x"
  , bench "L-1x" (nf fullTree_L 500) & bcompare "fullTree.Q-1x"
  , bench "Q-10x" (nf fullTree_Q 5000)
  , bench "G-10x" (nf fullTree_G 5000) & bcompare "fullTree.Q-10x"
  , bench "L-10x" (nf fullTree_L 5000) & bcompare "fullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p fullTree
All
  fullTree
    Q-1x:  OK
      20.6 μs ± 1.1 μs, 141 KB allocated, 477 B  copied, 6.0 MB peak memory
    G-1x:  OK
      28.6 μs ± 2.4 μs, 223 KB allocated, 928 B  copied, 6.0 MB peak memory, 1.39x
    L-1x:  OK
      64.3 μs ± 5.6 μs, 353 KB allocated, 3.7 KB copied, 6.0 MB peak memory, 3.13x
    Q-10x: OK
      287  μs ±  26 μs, 1.5 MB allocated,  57 KB copied, 7.0 MB peak memory
    G-10x: OK
      349  μs ±  30 μs, 2.2 MB allocated,  94 KB copied, 7.0 MB peak memory, 1.22x
    L-10x: OK
      935  μs ±  73 μs, 3.5 MB allocated, 386 KB copied, 7.0 MB peak memory, 3.25x
Oblivious unfolds

algorithm size time

Queue 500 1x

Global Levels 500 11x

Oblivious Weave 500 25x

Queue 5000 1x

Global Levels 5000 10x

Oblivious Weave 5000 24x
Code and raw output
unfoldM_BF_Q_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_Q_ f s0 = unfoldM_f (s0 <+ Empty)
  where
    unfoldM_f (q :> s) = f s >>= \case
      LeafF -> unfoldM_f q
      NodeF _ l r -> unfoldM_f (r <+ l <+ q)
    unfoldM_f Empty = pure ()
eatFullTree_Q :: Int -> Int
eatFullTree_Q size = (`execState` 0) (unfoldM_BF_Q_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_G :: Int -> Int
eatFullTree_G size = (`execState` 0) (unfoldM_BF_G_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_O :: Int -> Int
eatFullTree_O size = (`execState` 0) (unfoldM_BF_O_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree :: TestTree
eatFullTree = bgroup "eatFullTree"
  [ bench "Q-1x" (nf eatFullTree_Q 500)
  , bench "G-1x" (nf eatFullTree_G 500) & bcompare "eatFullTree.Q-1x"
  , bench "W-1x" (nf eatFullTree_O 500) & bcompare "eatFullTree.Q-1x"
  , bench "Q-10x" (nf eatFullTree_Q 5000)
  , bench "G-10x" (nf eatFullTree_G 5000) & bcompare "eatFullTree.Q-10x"
  , bench "W-10x" (nf eatFullTree_O 5000) & bcompare "eatFullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p eatFullTree
All
  eatFullTree
    Q-1x:  OK
      11.0 μs ± 724 ns,  78 KB allocated, 338 B  copied, 6.0 MB peak memory
    G-1x:  OK
      116  μs ±  11 μs, 379 KB allocated, 1.3 KB copied, 6.0 MB peak memory, 10.57x
    W-1x:  OK
      278  μs ±  14 μs, 830 KB allocated, 5.9 KB copied, 6.0 MB peak memory, 25.36x
    Q-10x: OK
      120  μs ±  11 μs, 781 KB allocated,  21 KB copied, 6.0 MB peak memory
    G-10x: OK
      1.23 ms ± 122 μs, 3.9 MB allocated, 109 KB copied, 7.0 MB peak memory, 10.27x
    W-10x: OK
      2.92 ms ± 255 μs, 8.4 MB allocated, 631 KB copied, 7.0 MB peak memory, 24.43x
Conclusion

I hope to have piqued your interest in breadth-first unfolds without using queues. To the best of my knowledge, this specific problem hasn’t been studied in the literature. It is of course related to breadth-first traversals, previously solved using the Phases applicative.² The intersection of functional programming and breadth-first walks is a small niche, which makes it quick to survey that corner of the world for any related ideas to those presented here.

The paper Modular models of monoids with operations by Zhixuan Yang and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an example application of their theory. Basically, Phases is defined by a fixed-point equation:
Phases f = Day f Phases :+: Identity
We can express Phases abstractly as a least fixed-point μx.f▫x + Id in any monoidal category with a suitable structure. If we instantiate the monoidal product ▫ not with Day convolution, but with functor composition (Compose), then we get Weave.

In another coincidence, the monad-coroutine package implements a weave function which is a generalization of liftS2—this may require some squinting. While WeaveS as a data type coincides with the free monad Free, monad-coroutine’s core data type Coroutine coincides with the free monad transformer FreeT.

We can view Phases as a generalization of “zipping” from lists to free applicatives—which are essentially lists of actions, and Weave generalizes that further to free monads. To recap, the surprise was that the naive data type of free monads results in a quadratic-time unfold. That issue motivated a “lazy” variant³ which achieves a linear-time breadth-first unfold. That in turn suggested an even “lazier” variant which enables more control on evaluation order at the cost of efficiency.

I’ve just released the weave library which implements the main ideas of this post. I don’t expect it to have many users, given how much slower it is compared to queue-based solutions. But I would be curious to find a use case for the new compositionality afforded by this abstraction.

Recap table

Unfolds Time Laziness Compositional

Phases^* No linear^† by levels Yes

Queue (Q) Yes linear^† strict No

Global Levels (G) Yes linear^† by levels No

Strict Weave (S) Yes quadratic^‡ strict Yes

Oblivious Weave (O) Oblivious only linear^† N/A Yes

Lazy Weave (L) Yes linear^† by levels Yes

Endless Weave (E) Yes quadratic^‡E maximally lazy^◊ Yes

^†Linear wrt. size: Θ(size).
^‡Quadratic wrt. height: lower bound Ω(height²), upper bound O(height × size).
^EThe “Endless” meshE only terminates with lazy monads.
^*I guess there exists an “endless Phases” variant, that would be quadratic and maximally lazy.
^◊The definition of “maximally lazy” in this post actually misses a range of possible lazy behaviors with monads other than Identity. A further refinement seems to be another can of worms.
The main action of this Literate Haskell program
main :: IO ()
main = defaultMain
  [ testGraphQ
  , testGraphG
  , testGraphS
  , testGraphL
  , testGraphE
  , whnfTreeQ
  , whnfTreeS
  , whnfTreeL
  , whnfTreeE
  , forceLeftTreeL
  , forceLeftTreeE
  , benchS
  , benchO
  , benchL
  , benchE
  , benchE'
  , fullTree
  , eatFullTree
  ]
Extra test cases
whnfTreeE :: TestTree
whnfTreeE = testCase "E-whnf" $ do
  case partialTreeE of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

whnfTreeQ :: TestTree
whnfTreeQ = expectFail $ testCase "Q-whnf" $ do
  case partialTreeQ of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

partialTreeQ :: Tree ()
partialTreeQ = runIdentity (unfoldM_BF_Q (Identity . partialTreeF) True)

bfGraph_L :: Int -> Tree Int
bfGraph_L = (`evalState` Set.empty) . unfoldM_BF_L visitGraph

testGraphL :: TestTree
testGraphL = testCase "L-graph" $
  bfGraph_L 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

bfGraph_E :: Int -> Tree Int
bfGraph_E = (`evalState` Set.empty) . unfoldM_BF_E visitGraph

testGraphE :: TestTree
testGraphE = testCase "E-graph" $
  bfGraph_E 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Note that tasty-bench also reports memory statistics (allocated, copied, and peak memory) when certain RTS options are enabled, which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎

The tree-traversals library by Noah Easterly.

Algebras for weighted search, by Donnacha Oisín Kidney and Nicolas Wu, in ICFP 2021.

Breadth-first traversal via staging, by Jeremy Gibbons, Donnacha Oisín Kidney, Tom Shrijvers, and Nicolas Wu, in MPC 2022. It has been revised into a short version (5 pages): Phases in software architecture in FUNARCH 2023.

↩︎

Speaking of variants of free monads, one might think of the “freer” monad, which has different motivations and which does not help us here.↩︎
by Lysxia at March 30, 2025 12:00 AM

height	time
baseline	1x
10x	105x

height	time
baseline	1x
10x	9.8x

height	time
baseline	1x
10x	9.93x

height	time
baseline	1x
10x	738x

height	time
1x
2x	10.9x
4x	9.5x
8x	5.4x
16x	1.4x

height	time
baseline	1x
1x	95x

height	time
1x
2x	3.2x
4x	4.2x
8x	4.5x
16x	1.7x

algorithm	size	time
Queue	500	1x
Global Levels	500	1.4x
Lazy Weave	500	3.1x
Queue	5000	1x
Global Levels	5000	1.2x
Lazy Weave	5000	3.3x

algorithm	size	time
Queue	500	1x
Global Levels	500	11x
Oblivious Weave	500	25x
Queue	5000	1x
Global Levels	5000	10x
Oblivious Weave	5000	24x

	Unfolds	Time	Laziness	Compositional
Phases^*	No	linear^†	by levels	Yes
Queue (Q)	Yes	linear^†	strict	No
Global Levels (G)	Yes	linear^†	by levels	No
Strict Weave (S)	Yes	quadratic^‡	strict	Yes
Oblivious Weave (O)	Oblivious only	linear^†	N/A	Yes
Lazy Weave (L)	Yes	linear^†	by levels	Yes
Endless Weave (E)	Yes	quadratic^‡E	maximally lazy^◊	Yes

March 24, 2025

GHC Developer Blog

GHC 9.6.7 is now available

GHC 9.6.7 is now available

Luite Stegeman - 2025-03-24

The GHC developers are very pleased to announce the availability of GHC 9.6.7. Binary distributions, source distributions, and documentation are available on the release page.

GHC 9.6.7 brings number of fixes, including:

GHC’s internal Unique type has been widened to 64-bits on 32-bit architectures, avoiding potential miscompilations on large projects.

Fix a runtime crash when using the compacting GC, caused by black holes in large objects.

Added new flags -fspec-eval and -fspec-eval-dictfun to allow switching off speculative evaluation.

The following libraries have been updated since GHC 9.6.6:

base-4.18.3.0

filepath-1.4.301.0

unix-2.8.6.0

bytestring-0.11.5.4

array-0.5.8.0

A more detailed description can be found in the release notes and a full list of changes in GHC!13841 and GHC!13891

Note about Haskell Language Server and building GHC 9.8+:

The change of Unique to 64 bit (GHC#22010) adds the exported symbol ghc_unique_counter64 to the RTS. Unfortunately it’s impossible to avoid this without breaking other things. If you encounter a linker error related to ghc_unique_counter64 when building GHC (or building a GHC-derived package like ghc-lib-parser) with GHC 9.6.7, you probably have to add this fix to the program you’re building.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

Please give this release a try and open a ticket if you see anything amiss.

~ Luite

by ghc-devs at March 24, 2025 12:00 AM

March 17, 2025

Haskell Interlude

63: Farhad Mehta

On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.

by Haskell Podcast at March 17, 2025 02:00 PM

March 16, 2025

Magnus Therning

Using lens-aeson to implement FromJSON
At work I sometimes need to deal with large and deep JSON objects where I'm only interested in a few of the values. If all the interesting values are on the top level, then aeson have functions that make it easy to implement FromJSON's parseJSON (Constructors and accessors), but if the values are spread out then the functions in aeson come up a bit short. That's when I reach for lens-aeson, as lenses make it very easy to work with large structures. However, I've found that using its lenses to implement parseJSON become a lot easier with a few helper functions.

Many of the lenses produces results wrapped in Maybe, so the first function is one that transforms a Maybe a to a Parser a. Here I make use of Parser implementing MonadFail.
infixl 8 <!>
(<!>) :: (MonadFail m) => Maybe a -> String -> m a
(<!>) mv err = maybe (fail err) pure mv
In some code I wrote this week I used it to extract the user name out of a JWT produced by Keycloak:
instance FromJSON OurClaimsSet where
    parseJSON = ... $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "preferred username missing"
        ...
        pure $ OurClaimsSet cs n ...
Also, all the lenses start with a Value and that makes the withX functions in aeson to not be a perfect fit. So I define variations of the withX functions, e.g.
withObjectV :: String -> (Value -> Parser a) -> Value -> Parser a
withObjectV s f = withObject s (f . Object)
That makes the full FromJSON instance for OurClaimsSet look like this
instance FromJSON OurClaimsSet where
    parseJSON = withObjectV "OurClaimsSet" $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "name"
        let rs = o ^.. key "resource_access" . members . key "roles" . _Array . traverse . _String
        pure $ OurClaimsSet cs n rs
Tags: haskell
March 16, 2025 08:26 AM

Planet Haskell

July 10, 2025

Workspaces

Publish all my crates

How it works

How to try it

July 09, 2025

About the Haskell Unfolder

July 07, 2025

The Problem

The Algorithm

Defining our Terms

Getting the Final Answer from our Terms

Solving for p1

Rust Implementation

Anatomy of a Binary Search

Getting the Final Answer

Haskell Implementation

Conclusion

July 03, 2025

June 30, 2025

The Problem

The Algorithm

Haskell Solution

Rust Solution

Why an Index Loop?

Conclusion

June 28, 2025

Interlude on the Redis serialisation protocol (RESP3)

Rewriting sendCmd

What's next?

Footnotes:

June 27, 2025

Competitive programming in Haskell: prefix sums

Precomputing prefix sums

Practice problems

June 25, 2025

About the Haskell Unfolder

June 24, 2025

June 23, 2025

The Problem

The Algorithm

Haskell Solution

Rust Solution

Conclusion

Competitive programming in Haskell: range queries, classified

Static range queries

Dynamic range queries

Table

June 22, 2025

June 20, 2025

The protocol

A command is a function in need of a sender

A command for free delivery

Converging and simplifying

What's next?

Footnotes:

Sponsorship

Team

GHC

Highlights

Explicit level imports

GHCi support for primops

Improvements to the debugger

Expressions in SPECIALISE pragmas

Multiple Home Units support in GHCi

GHC Releases

Frontend

Backend

Compiler performance

Recompilation checking

Runtime system

GHCi & bytecode interpreter

Libraries

Build system & packaging

Testsuite

Cabal

HLS

Haskell.org infrastructure

June 17, 2025

Rewriting `sendCmd`

Expressions in `SPECIALISE` pragmas