Sure, it’s gaudy, but it’s got a few things going for it, too.
Let’s put aside for the moment that you can already send my website back into “90s mode” and dive into this take on how I could
present myself in a particularly old-school way. There’s a few things I particularly love:
It’s actually quite lightweight: ignore all the animated GIFs (which are small anyway) and you’ll see that, compared to my current homepage, there are very few
images. I’ve been thinking about going in a direction of less images on the homepage anyway, so it’s interesting to see how it comes together in this unusual context.
The page sections are solidly distinct: they’re a mishmash of different widths, some of which exhibit a horrendous lack of responsivity, but it’s pretty clear where
the “recent articles” ends and the “other recent stuff” begins.
The post kinds are very visible: putting the “kind” of a post in its own column makes it really clear whether you’re looking at an article, note, checkin, etc., much
more-so than my current blocks do.
Maybe there’s something we can learn from old-style web design? No, I’m serious. Stop laughing.
90s web design was very-much characterised by:
performance – nobody’s going to wait for your digital photos to download on narrowband connections, so you hide them behind descriptive links or tiny thumbnails, and
pushing the boundaries – the pre-CSS era of the Web had limited tools, but creators worked hard to experiment with the creativity that was possible within those
limits.
Those actually… aren’t bad values to have today. Sure, we’ve probably learned that animated backgrounds, tables for layout, and mystery meat navigation were horrible for
usability and accessibility, but that doesn’t mean that there isn’t still innovation to be done. What comes next for the usable Web, I wonder?
As soon as you run a second or third website through the tool, its mechanisms for action become somewhat clear and sites start to look “samey”, which is the opposite of what
made 90s Geocities great.
The only thing I can fault it on is that it assumes that I’d favour Netscape Navigator: in fact, I was a die-hard Opera-head for most of the
nineties and much of the early naughties, finally switching my daily driver to Firefox in 2005.
I certainly used plenty of Netscape and IE at various points, though, but I wasn’t a fan of the divisions resulting from the browser wars. Back in the day, I always backed
the ideals of the “Viewable With Any Browser” movement.
As I mentioned in my recent Blog Questions Challenge, I recently switched my blog from WordPress, which it had been running on for over 20 years of its 26 year history, to ClassicPress.1
I’m aware that I’m not the only person for whom ClassicPress might be a better fit than WordPress2,
so I figured I should share the process by which I undertook the change.
Switching from WordPress to ClassicPress
Switching from WordPress to ClassicPress should be a non-destructive, 100% reversible process, but (even though I’ve got solid backups) I wasn’t ready to
trust that, so I decided to operate on a copy of my site. I’m glad I did, because there were a couple of teething issues I needed to tackle before I could launch.
1. Duplicating the site
I took a simple approach to duplicating the site: (1) I copied the site directory, and (2) I copied the database, and (3) I set up a new subdomain to use for testing. Here’s how I did
each step:
1.1. Copying the site directory
This should’ve been simple, but a du -sh revealed that my /wp-content/uploads directory is massive (I should look into that) and I didn’t want to
clone it. And I didn’t want r need to clone my /wp-content/cache directory either. So I ran:
rsync -av --exclude=wp-content ./old-site-directory/ ./new-site-directory/ to copy everything exceptwp-content, and then
rsync -av --exclude=uploads --exclude=cache ./old-site-directory/wp-content/ ./new-site-directory/wp-content/ to copy wp-contentexcept the
uploads and cache subdirectories, and then finally
ln -s ./old-site-directory/wp-content/uploads ./new-site-directory/wp-content/uploads to symlink the uploads directory, sharing it between the two sites
1.2. Copying the database
I just piped mysqldump into mysql to clone from one database to the other:
mysqldump -uUSERNAME -p --lock-tables=false old-site-database | mysql -uUSERNAME -p new-site-database
I edited DB_NAME in wp-config.php in the new site’s directory to point it at the new database.
If you’re going to clone your WordPress site before converting to ClassicPress, you’ll want to be comfortable editing your wp-config.php.
1.3. Setting up a new subdomain
My DNS is already configured with a wildcard to point (almost) all *.danq.me subdomains to this server already. I decided to use the name classicpress-testing.danq.me as my
temporary/test domain name. To keep any “changes” to my cloned site to a minimum, I overrode the domain name in my wp-config.php rather than in my database, by adding the
following lines:
Because I use Caddy/FrankenPHP as my webserver3,
configuration was really easy: I just copied the relevant part of my Caddyfile (actually an include), changed the domain name and the root, and it just worked,
even provisioning me out a LetsEncrypt SSL certificate. Magical4.
2. Switching the duplicate to ClassicPress
Now that I had a duplicate copy of my blog running at https://classicpress-testing.danq.me/, it was time to switch it to ClassicPress. I started by switching my wp-admin
colour scheme to a different one in my cloned site, so it’d be immediately visually-obvious to me if I’d accidentally switched and was editing the “wrong” site (I also made sure I was
logged-out of my primary, live site, so I was confident I wouldn’t break anything while I was experimenting!).
ClassicPress provides a migration plugin which checks for common problems and then switches your site
from WordPress to ClassicPress, so I installed it and ran it. It said that everything was okay except for my (custom) theme and a my self-built plugins, which it understandably couldn’t
check compatibility of. It recommended that I install Twenty Seventeen – the last WordPress default theme to not
require the block editor – but I didn’t do so: I was confident that my theme would work anyway… and if it didn’t, I’d want to fix it rather than switch theme!
I failed to take a screenshot of the actual process, but it looked broadly like this.
And then… it all broke.
3. Fixing what broke
After swiftly doing a safety-check that my live site was still intact, I started trying to work out why my site wasn’t broken. Debugging a ClassicPress PHP issue is functionally
identical to debugging a similar WordPress issue, for obvious reasons: check the logs, work out what’s broken, realise it’s a plugin, disable that plugin while you investigate further,
etc.
EWWW Image Optimizer: I use this plugin to pregenerate WebP variants of my images, which I then serve using webserver rules. It’s not a
complex job, and I should probably integrate the feature into my theme at some point, but for now I use this plugin. Version 8.0.0 of the plugin doesn’t work on ClassicPress 2.3.1, so
I used WP-CLI to downgrade to the last version that does (7.7.0), and then it worked fine.
Dan’s Geocaching Log Reposter: a self-made plugin that copies my logs from geocaching websites stopped working properly, which I think is because
ClassicPress is doing a more-aggressive job than WordPress at nonce validation on admin REST endpoints? I put a quick hack into my plugin to work around it, but I’ll need to look into
this properly at some point.
Some other bits of my stack, e.g. CapsulePress (my Gemini/Spartan/Nex server), have their own copies of my
database credentials, because I’ve been too lazy to centralise them into environment variables, and needed updating (but not until live switchover time).
I ran the two sites in-parallel for a couple of weeks, with the ClassicPress one as a “read only” version (so I didn’t pollute my uploads directory!), but it was pretty unnecessary
because it all worked pretty seamlessly, despite my complex stack of custom code. When I wanted to switch for-real, all I needed to do was swap the domain names over in my Caddyfile and
edit the wp-config.php of my ClassicPress installation: step 1.3, but in reverse!
If you hadn’t been told5, you probably wouldn’t have even known I’d made a change: I suppress basically all infrastructure-identifying
headers from my server output as a matter of course, and ClassicPress and WordPress are functionally-interchangeable from a front-end perspective6.
So what’s difference?
From my experience, here are the differences I’ve discovered since switching from WordPress to ClassicPress:
The good stuff
😅 ClassicPress has no Gutenberg/block editor. This would absolutely be a showstopper for many people, and that’s fine: I have nothing against the block editor (I
use it basically every day elsewhere!), but I’ve never really used it on danq.me and don’t feel the need to change that! My theme, my workflow, and my custom plugins are all
geared around the perfectly-good “classic” editor, and so getting a more-lightweight CMS by removing a feature I wasn’t using anyway falls somewhere between neutral and a blessing.
⚡The backend is fast again! One of the changes the ClassicPress team have been working on applying to WordPress is to strip out jQuery and other redundancies from
the backend, and I love how much faster and lighter my editor interface is as a result. (With caveat; see below!)
🔌Virtually everything “just works”. With the few exceptions described above, everything works exactly as it does under WordPress. Which is what you’d hope for a fork
that’s mostly “WordPress, but without the block editor”, right, but it’s still reassuring (and, for me, an essential feature). There are a few “new” features to do with paging through
posts and the media library and they’re fine, I suppose, but not by themselves worth switching for (though it might be nice to backport them into WordPress!).
The bad stuff
🏷️ Adding tags to posts takes a step backwards. A side-effect of dropping jQuery is the partial loss of the autocomplete feature when selecting tags to add to a post.
You still get a partial autocomplete, but not after typing a comma: you need to press enter to submit the tag you were writing and then start typing them next, which
frankly sucks. This is because they’re relying on a <datalist>, which isn’t as full-featured as the Javascript solution WordPress employs. This bugs
me almost enough to be a showstopper, but I gather it’s getting fixed in a near-future version.
🗺️ You’re in uncharted territory when things go wrong. One great benefit of WordPress is the side-effects of its ubiquity. If you have a query or a problem
you can throw a stone at your favourite search engine and get a million answers… and some of them will even be right! If you have a problem in ClassicPress and it’s not shared with (or
you’re not sure if it’s shared with) WordPress… you’re mostly on your own. The forums are good and friendly,
but if you want a quick answer to something, you’re likely to have to roll your sleeves up and open some source code. I don’t mind this at all – when I first started using WordPress,
this was the case, too! – but it might be a showstopper for some folks.
In summary: I’m enjoying using ClassicPress, even where there are rough edges. For me, 99% of my experience with it is identical to how I used WordPress anyway, it’s relatively
lightweight and fast, and it’s easy enough to switch back if I change my mind.
Footnotes
1 It saddens me that I have to keep clarifying this, but I feel like I do: my switch from
WordPress to ClassicPress is absolutely nothing to do with any drama in the WordPress space that’s going on right now: in fact, I’d been planning to try it out since before
any of the drama appeared. I appreciate that some people making a similar switch, including folks who use this blog post as a guide, might have different motivations to me, and that’s
fine too. Personally, I think that ditching an installation of open-source WordPress based on your interpretation of what’s going on in the ecosystem is… short-sighted? But
hey: the joy of open source is you can – and should! – do what you want. Anyway: the short of it is – the desire to change from WordPress to ClassicPress was, for me, 100% a
technical decision and 0% a political one. And I’ll thank you for leaving any of your drama at the door if you slide into my comments, ta!
2Matt recently described ClassicPress as “the last decent fork
attempt for WordPress”, and I absolutely agree. There’s been a spate of forks and reimplementations recently. I’ve looked into many of them and been… very much underwhelmed. Want my
hot take? Sure, here you go: AspirePress is all lofty ideas and no deliverables. FreeWP seems to be the same, but somehow without the lofty ideas. ForkPress is a ghost. Speaking of
ghosts, Ghost isn’t a WordPress fork; they have got some cool ideas though. b2evolution is even less a WordPress fork but it’s pretty cool in its own right. I’m not sure what
clamPress is trying to achieve but I’ve not given it a serious look. So yeah: ClassicPress is, in my mind, the only WordPress fork even worth consideration at this point, and as I
describe in this blog post: it’s not for everybody.
3 I switched from Nginx over the winter and it’s been just magical: I really love
Caddy’s minimal approach to production configuration. The only thing I’ve been able to fault it on is that it’s not capable of setting up client-side SSL certificate authentication on
a path, only on an entire domain, which meant I needed to reimplement the authentication mechanism I use on a small part of my (non-blog) internal
infrastructure.
4 To be fair, it wouldn’t have been hard if I’d still be using Nginx, because I’d
set up Certbot to use DNS-based vertification to issue me wildcard SSL certificates. But doing this in Caddy still felt magical.
6 Indeed, I wouldn’t have considered a switch to ClassicPress in the first place if it
wasn’t a closely-aligned-enough fork that I retained the ability to flip-flop between the two to my heart’s content! I’ve loved WordPress for over two decades; that’s not going to
change any time soon… and if e.g. ClassicPress ceased tracking WordPress releases and the fork diverged too far for my comfort, I’d probably switch back to regular old WordPress!
It felt like a natural evolution of my second vanity-site. It was 1998, and my site – Castle of the Four Winds – was home to a selection of the same kinds of random crap that
everybody put on their homepages at the time. I figured I’d start keeping an online diary: the word “blog” hadn’t been coined yet, and its predecessor “weblog” had only been around for
a year and I hadn’t come across it.
So I experimentally started posting a few times a week.
What platform are you using to manage your blog and why did you choose it? Have you blogged on other platforms before?
1998: Static HTML and a bit of Perl
When I started blogging my site was almost entirely plain HTML2.
So my original “platform” was probably Emacs.
2000: Static files indexed by PHP
In the Summer of 2000 I registered avangel.com and moved my diary there. I was still storing posts in static files, but used PHP wrappers to share the structure and menus across the
pages. It was a massive improvement.
Later, I moved everything to the (ill-advised?) domain name scatmania.org and reimplemented in pretty-much the same way. Until…
I’d have outgrown Flip eventually, but I got a nudge in that direction in July 2004. At the time, I was sharing a server
with some friends and operated by Gareth, and something went wrong and the server went completely offline. The co-located server disappeared back
to Gareth’s house, eventually, and while I’d recovered many of the posts from my own backups, 61 posts remain partially-incomplete to this
day (if you happen to have a copy of any of them I’d love to see it!).
I brought my blog back online using WordPress, whose then-new release version 1.2
included an RSS-powered importer: this allowed me to write a little code to convert my entire previous archive into a fat RSS file and then import it wholesale. WordPress was, as
remains, pretty magical – a universal blogging platform that evolved into a universal CMS – and I back in the day I occasionally argued online with Matt
about technical aspects of the future direction of the project4.
Those drop-shadows! Those gradients! Those naked hyperlinks differentiated only by being a slightly different colour! That aggressive use of sans-serif fonts with expanded
line-heights! Those RSS links, front-and-centre! The only thing that could make this more-obviously “Web 2.0” would be the addition of a wonky “beta” star in the corner.
If you didn’t know better, you might well not know I’m running WordPress. My theme and custom plugins are… well, they’re an ecosystem all by themselves. And that’s before you even get
to things like CapsulePress, my WordPress-to-Gopher/Gemini/Spartan/Nex bridge, the
pile of scripts I use to sync-up with the Fediverse, the PWA I use to post notes while I’m on the move, and so on.
2025: ClassicPress
Earlier this year I experimentally switched to ClassicPress; a fork of WordPress. There’ll doubtless be lots more to say about that, down the
line5,
but here’s the skinny: I don’t use Gutenberg on my blog anyway6,
I appreciate having my backend be almost as high-performance as I’ve worked to make my frontend, and I enjoy most of
the feature differences7.
How do you write your posts? For example, in a local editing tool, or in a panel/dashboard that’s part of your blog?
With the exception of notes (most of which are written in a tool of my own creation and then pushed to one or both of my Mastodon and my blog
simultaneously), I mostly write right into the WordPress/ClassicPress post editor.
I often write ideas, concepts, and first drafts into my Obsidian notebook and then copy/paste out when the time comes.
When do you feel most inspired to write?
There’s no particular pattern, though it feels like I’m most-inspired to write exactly when I should be prioritising something else! That’s why it’s so helpful to be able to
write three sentences into Obsidian and then come back to it later!
I’ve been on a bit of a blogging kick these last few years, though. Last year I wrote a massive 436 posts, although that admittedly includes PESOS‘d checkins from geocaching and geohashing expeditions. I’m a fan of
Kev’s #100DaysToOffload challenge, and I’m on course to achieve it earlier than ever before, this year (my sixth consecutive year: I do the
challenge strictly by calendar years!), as this post is already by 48th… all within the first 38 days of this year8.
Do you publish immediately after writing, or do you let it simmer a bit as a draft?
A mixture of both. Probably most of my posts are written in a single sitting… or, at least, are written in a tab that stays open for the entire time during which it’s written.
But others spend a long time in-progress. You remember how almost a year ago I gave a talk about why Oxford’s area code is 01865? And I promised that there’d be a blog/vlog/maybe-podcast version of that talk later?
Yeah: that’s been 90%-there and sitting in a draft pretty-much since then, just waiting for me to make the finishing touches (and record the vlog/podcast variants, if that’s the
direction I decide to go in).
And I’ve dusted off drafts that’ve been much older than that, before, too. So it really is a mixture.
What’s your favourite post on your blog?
I couldn’t pick out a favourite that I wouldn’t change my mind about five minutes later. But a recent favourite might have been last Spring’s “Let Your Players Lead The Way”, which aimed to impart some of the things I’ve learned about gamemastering (especially) while being the
dungeon master for The Levellers these last few years9.
Not only was it a post that had been a long time coming, and based on months of drafts and re-drafts, but also I really enjoyed writing some post-specific CSS to give it just a slightly
more-magical feel.
The downside is that I’ve now got one more thing to try not to break the next time I re-write my blog’s stylesheet.
Any future plans for your blog? Maybe a redesign, a move to another platform, or adding a new feature?
I want to redesign the homepage to be simpler, less-graphical, and more-informational. I’m not sure how that’s going to look, yet.
And as I mentioned: I’m experimenting with ClassicPress. It’s working out mostly-okay so far, but that’s a story for another post.
Next?
I feel like I’m the last person in the universe to do this quiz. But if you haven’t – and you have anything approximating a blog – then you should go next.
Footnotes
1 I wouldn’t recommend actually reading my older posts, though. I was a teenager,
and it shows.
2 I had a slightly-fancier kind of hosting, by this point, that gave me a
cgi-bin directory into which I could compile binaries (in C) or write scripts (in Perl). My hit counter? That was a Perl script I adapted from Matt Wright’s counter.pl and “enhanced” with some flaming text using Corel
Photo-Paint.
5 Right off the bat, though, let me stress that trying ClassicPress is absolutely nothing
to do with the drama in the WordPress space right now: in fact I’ve been planning to give it a try ever since the project got its shit together, re-forked WordPress, and released ClassicPress 2.0 a year ago.
6 I don’t have anything against Gutenberg – I use it on other blogs, and every day at
work! – and Block Themes are magical… but I’ve never found any benefit to them here: I’ve no need for it, and I’ve got plugins I’ve written for my own use that I’ve never bothered to
make Gutenberg-compatible.
7 My biggest gripe with ClassicPress so far is that in removing the jQuery dependency on
the post editor’s tag selector they’ve only replaced it with a <datalist>, which is neat and all but kills the ability to autocomplete multiple
comma-separated tags at once. But it looks like that’s getting fixed, so I’m going to hang in there for a bit
before I decide whether I’m sticking with ClassicPress or not.
8 I’ll save you from doing the maths: if I complete 48 posts in 38 days, I’d expect to
complete 100 posts on my 80th day: as it’s not a leap year, that would be Friday 21 March 2025. Let’s see how I get on!
9 Although I’ve been horribly neglecting them for the last couple of months, for various
reasons.
I’ve a notion that during 2025 I might put some effort into tidying up the tagging taxonomy on my blog. There’s a few tags that are duplicates (e.g.
ai and artificial intelligence) or that exhibit significant overlap (e.g. dog and dogs), or that were clearly created when I
speculated I’d write more on the topic than I eventually did (e.g. homa night, escalators1,
or nintendo) or that are just confusing and weird (e.g. not that bacon sandwich picture).
One part of such an effort might be to go back and retroactively add tags where they ought to be. For about the first decade of my blog, i.e. prior to around 2008, I rarely used tags to
categorise posts. And as more tags have been added it’s apparent that many old posts even after that point might be lacking tags that perhaps they ought to have2.
I remain sceptical about many uses of (what we’re today calling) “AI”, but one thing at
which LLMs seem to do moderately well is summarisation3. And isn’t tagging and categorisation only a stone’s throw away from
summarisation? So maybe, I figured, AI could help me to tidy up my tagging. Here’s what I was thinking:
Tell an LLM what tags I use, along with an explanation of some of the quirkier ones.
Train the LLM with examples of recent posts and lists of the tags that were (correctly, one assumes) applied.
Give it the content of blog posts and ask what tags should be applied to it from that list.
Script the extraction of the content from old posts with few tags and run it through the above, presenting to me a report of what tags are recommended (which could then be coupled
with a basic UI that showed me the post and suggested tags, and “approve”/”reject” buttons or similar.
Extracting training data
First, I needed to extract and curate my tag list, for which I used the following SQL4:
SELECTCOUNT(wp_term_relationships.object_id) num, wp_terms.slug FROM wp_term_taxonomy
LEFTJOIN wp_terms ON wp_term_taxonomy.term_id = wp_terms.term_id
LEFTJOIN wp_term_relationships ON wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
WHERE wp_term_taxonomy.taxonomy ='post_tag'AND wp_terms.slug NOTIN (
-- filter out e.g. 'rss-club', 'published-on-gemini', 'dancast' etc.-- these are tags that have internal meaning only or are already accurately applied'long', 'list', 'of', 'tags', 'the', 'ai', 'should', 'never', 'apply'
)
GROUPBY wp_terms.slug
HAVING num >2-- filter down to tags I actually routinely useORDERBY wp_terms.slug
Many of my tags are used for internal purposes; e.g. I tag posts published on gemini if they’re to appear on gemini://danq.me/ and
dancast if they embed an episode of my podcast. I filtered these out because I never want the AI to suggest applying them.
I took my output and dumped it into a list, and skimmed through to add some clarity to some tags whose purpose might be considered ambiguous, writing my explanation of each in
parentheses afterwards. Here’s a part of the list, for example:
I used that list as the basis for the system message of my initial prompt:
Suggest topical tags from a predefined list that appropriately apply to the content of a given blog post.
# Steps
1. **Read the Blog Post**: Carefully read through the provided content of the blog post to identify its main themes and topics.
2. **Analyse Key Aspects**: Identify key topics, themes, or subjects discussed in the blog post.
3. **Match with Tags**: Compare these identified topics against the list of available tags.
4. **Select Appropriate Tags**: Choose tags that best represent the main topics and themes of the blog post.
# Output Format
Provide a list of suggested tags. Each tag should be presented as a single string. Multiple tags should be separated by commas.
# Allowed Tags
Tags that can be suggested are as follows. Text in parentheses are not part of the tag but are a description of the kinds of content to which the tag ought to be applied:
- aberdyfi
- aberystwyth
- ...
- youtube
- zoos
# Examples
**Input:**
The rapid advancement of AI technology has had a significant impact on my industry, even on the ways in which I write my blog posts. This post, for example, used AI to help with tagging.
**Output:**
ai, technology, blogging, meta, work
...(other examples)...
# Notes
- Ensure that all suggested tags are relevant to the key themes of the blog post.
- Tags should be selected based on their contextual relevance and not just keyword matching.
This system prompt is somewhat truncated, but you get the idea.
That post already has the following tags (but this wasn’t disclosed to the AI in its training set; it had to work from scratch): children, language, languages (a bit of a redundancy there!), spain, and unicode.
Testing it out
Let’s see what the AI suggests:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json"\
-H "Authorization: Bearer $OPENAI_TOKEN"\
-d '{ "model": "gpt-4o-mini", "messages": [ { "role": "system", "content": [ { "type": "text", "text": "[PROMPT AS DESCRIBED ABOVE]" } ] }, { "role": "user", "content": [ { "type": "text", "text": "My 8-year-old asked me \"In Spanish, I need to use an upside-down interrobang at the start of the sentence‽\" I assume the answer is yes A little while later, I thought to check whether Unicode defines a codepoint for an inverted interrobang. Yup: ‽ = U+203D, ⸘ = U+2E18. Nice. And yet we dont have codepoints to differentiate between single-bar and double-bar \"cifrão\" dollar signs..." } ] } ], "response_format": { "type": "text" }, "temperature": 1, "max_completion_tokens": 2048, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0}'
Running this via command-line curl meant I quickly ran up against some Bash escaping issues, but set +H and a little massaging of the blog post content
seemed to fix it.
GPT-4o-mini
When I ran this query against the gpt-4o-mini model, I got back: unicode, language, education, children, symbols.
That’s… not ideal. I agree with the tags unicode, language, and children, but this isn’t really abouteducation. If I tagged
everything vaguely educational on my blog with education, it’d be an even-more-predominant tag than geocaching is! I reserve that tag for things that relate
specifically to formal education: but that’s possibly something I could correct for with a parenthetical in my approved tags list.
symbols, though, is way out. Sure, the post could be argued to be something to do with symbols… but symbols isn’t on the approved tag list in
the first place! This is a clear hallucination, and that’s pretty suboptimal!
Maybe a beefier model will fare better…
GPT-4o
I switched gpt-4o-mini for gpt-4o in the command above and ran it again. It didn’t take noticeably longer to run, which was pleasing.
The model returned: children, language, unicode, typography. That’s a big improvement. It no longer suggests education,
which was off-base, nor symbols, which was a hallucination. But it did suggest typography, which is a… not-unreasonable suggestion.
Neither model suggested spain, and strictly-speaking they were probably right not to. My post isn’t about Spain so much as it’s about Spanish. I don’t
have a specific tag for the latter, but I’ve subbed in the former to “connect” the post to ones which are about Spain, but that might not be ideal. Either way: if this is how
I’m using the tag then I probably ought to clarify as such in my tag list, or else add a note to the system prompt to explain that I use place names as the tags for posts about
the language of those places. (Or else maybe I need to be more-consistent in my tagging).
I experimented with a handful of other well-tagged posts and was moderately-satisfied with the results. Time for a more-challenging trial.
This time, with feeling…
Next, I decided to run the code against a few blog posts that are in need of tags. At this point, I wasn’t quite ready to implement a UI, so I just adapted my little hacky Bash
script and copy-pasted HTML-stripped post contents directly into it.
If it worked, I decided, I could make a UI. Until then, the command line was plenty sufficient.
In this post, I shared that my grandmother and my coworker had (independently) been taken into hospital. It had no tags whatsoever.
The AI suggested the tags hospital, family, injury, work, weddings, pub, humour. Which at
a glance, is probably a superset of the tags that I’d have considered, but there’s a clear logic to them all.
It clearly picked out weddings based on a throwaway comment I made about a cousin’s wedding, so I disagree with that one: the post isn’t strictly about weddings
just because it mentions one.
pub could go either way. It turns out my coworker’s injury occurred at or after a trip to the pub the previous night, and so its relevance is somewhat unknowable from this
post in isolation. I think that’s a reasonable suggestion, and a great example of why I’d want any such auto-tagging system to be a human assistant (suggesting
candidate tags) and not a fully-automated system. Interesting!
Finally, you might think of humour as being a little bit sarcastic, or maybe overly-laden with schadenfreude. But the blog post explicitly states that my coworker
“carefully avoided saying how he’d managed to hurt himself, which implies that it’s something particularly stupid or embarrassing”, before encouraging my friends to speculate on it.
However, it turns out that humour isn’t one of my existing tags at all! Boo, hallucinating AI!
I ended up applying all of the AI’s suggestions except weddings and humour. I also applied smartdata, because that’s where I worked (the AI couldn’t have been expected to guess that without context, though!).
This post talked about Ash and I’s travels around the UK to see REM and Green Day in concert5 and to the National Science Museum in London where I discovered that Ash was prejudiced towards…
carrot cake.
The AI suggested: concerts, travel, music, preston, london, science museum, blogging.
Those all seemed pretty good at a first glance. Personally, I’d forgotten that we swung by Preston during that particular grand tour until the AI suggested the tag, and then I had to
look back at the post more-carefully to double-check! blogging initially seemed like a stretch given that I was only blogging about not having blogged much, but on
reflection I think I agree with the robot on this one, because I did explicitly link to a 2002 page that fell off the Internet only a few years ago aboutthe pointlessness of blogging. So I think it counts.
I was able to verify that I’d been in Preston with thanks to this contemporaneous photo. I have no further explanation for the content of the photo, though.
science museum is a big fail though. I don’t use that tag, but I do use the tag museum. So close, but not quite there, AI!
I applied all of its suggestions, after switching museum in place of science museum.
I wrote this blog post in celebration of having managed to hack together some stuff to help me remote-control my PC from my phone via Bluetooth, which back then used to be a challenge,
in the hope that this would streamline pausing, playing, etc. at pizza-distribution-time at Troma Night, a weekly film night I hosted back then.
If you were sat on that sofa, fighting your way past other people and a mango-chutney-barrel-cum-table to get to a keyboard was genuinely challenging!
It already had the tag technology, which it inherited from a pre-tagging evolution of my blog which used something akin to categories (of which only one
could be assigned to a post). In addition to suggesting this, the AI also picked out the following options: bluetooth, geeky, mobile, troma
night, dvd, technology, and software.
The big failure here was dvd, which isn’t remotely one of my tags (and probably wouldn’t apply here if it were: this post isn’t about DVDs; it barely even mentions
them). Possibly some prompt engineering is required to help ensure that the AI doesn’t make a habit of this “include one tag not from the approved list, every time” trend.
Apart from that it’s a pretty solid list. Annoyingly the AI suggested mobile, which isn’t an approved tag, instead of mobiles, which is. That’s probably a
tokenisation fault, but it’s still annoying and a reminder of why even a semi-automated “human-checked” system would need a safety-check to ensure that no absent tags are
allowed through to the final stage of approval.
This post!
As a bonus experiment, I tried running my code against a version of this post, but with the information about the AI’s own prompt and the examples removed (to reduce the risk
of confusion). It came up with: ai, wordpress, blogging, tags, technology, automation.
All reasonable-sounding choices, and among those I’d made myself… except for tags and automation which, yet again, aren’t among tags that I use. Unless this
tendency to hallucinate can be reined-in, I’m guessing that this tool’s going to continue to have some challenges when used on longer posts like this one.
Conclusion and next steps
The bottom line is: yes, this is a job that an AI can assist with, but no, it’s not one that it can do without supervision. The laser-focus with which gpt-4o was able to
pick out taggable concepts, faster than I’d have been able to do for the same quantity of text, shows that there’s potential here, but it’s not yet proven itself enough of a time-saver
to justify me writing a fluffy UI for it.
However, I might expand on the command-line tools I’ve been using in order to produce a non-interactive list of tagging suggestions, and use that to help inform my work as I tidy up the
tags throughout my blog.
You still won’t see any “AI-authored” content on this site (except where it’s for the purpose of talking about AI-generated content, and it’ll always be clearly labelled), and
I can’t see that changing any time soon. But I’ll admit that there might be some value in AI-assisted curation and administration, so long as there’s an informed human in the loop at
all times.
Footnotes
1 Based on my tagging, I’ve apparently only written about escalators once, while playing Pub Jenga at Robin‘s 21st birthday party. I can’t imagine why I thought it deserved a tag.
2 There are, of course, various other people trying similar approaches to this and similar
problems. I might have tried one of them, were it not for the fact that I’m not quite as interested in solving the problem as I am in understanding how one might use an AI to
solve the problem. It’s similar to how I don’t enjoy doing puzzles like e.g. sudoku as much as I enjoy writing software that optimises for solving such puzzles. See also, for
example, how I beat my children at Mastermind or what the hardest word in Hangman is
or my variousattempts to avoid doing online jigsaws.
Unsurprisingly my checkins, which represent #geocaching/#geohashing activity,
grow in the spring and peak in the summer when the weather’s better!
At first I assumed the notes peak in November might have been thrown off by a single conference, e.g. musetech, but it turns out I’ve
just done more note-friendly things in Novembers, like Challenge Robin II and my Cape Town
meetup, which are enough to throw the numbers off.
At a little over 590 thousand words and spanning 1,349 pages, Vikram Seth’s A Suitable Boy is almost-certainly among
the top ten longest single-volume English-language novels. It’s pretty fucking huge.
I’ll stick with the Kindle edition: I fear that merely holding the paperback would be exhausting.
I only discovered A Suitable Boy this week (and haven’t read it – although there are some good reviews that give me an inclination to) when, on a whim, I decided to try to get
a scale of how much I’d ever written on this blog and then decided I needed something tangible to use as a comparison. Because – give or take – that’s how much I’ve written here, too:
At 593,457 words, this blog wouldn’t fit into that book unless we printed it on the covers as well.
Of course, there’s some caveats that might make you feel that the total count should be lower:
It might include a few pieces of non-content code, here and there. I tried to strip them out for the calculation, but I wasn’t entirely successful.
It included some things which might be considered metadata, like image alt-text (on the other hand, sometimes I like to hide fun messages in my image alt-text, so perhaps they
should be considered content).
On the other hand, there are a few reasons that it perhaps ought to be higher:
Post titles (which sometimes contain part of the content) and pages outside of blog posts are not included in the word count.
I’ve removed all pictures for the purpose of the word count. Tempting though it was to make each worth a thousand words, that’d amount to about another one and a half million words,
which seemed a little excessive.
Another reason for not counting images was that it was harder than you’d think to detect repeat use of images that I’ve used too many times. Like this one.
Of course, my blog doesn’t really have a plot like A Suitable Boy (might compare well to the even wordier Atlas Shrugged, though…): it’s a mixture of mostly
autobiographical wittering interspersed with musings on technology and geekery and board games and magic and VR and stuff. I’m pretty sure that if I knew where my life would be now, 18
years ago (which is approximately when I first started blogging), I’d have, y’know, tried to tie it all together with an overarching theme and some character development or something.
Or perhaps throw in the odd plot twist or surprise: something with some drama to keep the reader occupied, rather than just using the web as a stream-of-conciousness diary of whatever
it is I’m thinking about that week. I could mention, for example, that there’ll be another addition to our house later this year. You heard it here first (unless you already heard it
from somewhere else first, in which case you heard it there first.)
Brought up in a world of tiny, bright, UHD colour touchscreens, Annabel seemed slightly underwhelmed by the magic of a sonograph picture of her future baby brother.
Still: by the end of this post I’ll have hit a nice, easy-to-remember 594,000 words.