Setup default data for new instance by Nutomic · Pull Request #6077 · LemmyNet/lemmy

Nutomic · 2025-10-14T14:33:15Z

Fetch a list of communities from lemmy.ml and embed it in the binary
On initial site setup fetch all these communities with most recent posts
In debug mode only load https://lemmy.ml/c/lemmy to reduce cpu/network usage while making sure it works
Also create a default community, and a sticky post with getting started info (link to docs, matrix, support etc)
On first day after new instance is created, fetch less data over federation (to reduce server load from fetching so many communities)

dessalines · 2025-10-14T17:33:32Z

crates/api/api_crud/build.rs

+  let community_ids = if env::var("OUT_DIR").unwrap() == "release" {
+    // fetch list of communities from lemmyverse.net
+    let mut communities: Vec<CommunityInfo> =
+      reqwest::blocking::get("https://data.lemmyverse.net/data/community.full.json")?.json()?;


Lets please use your crawler, and not this service we have no control over. It only needs to be extended slightly to add communities.

Lets also not fetch this data within lemmy, but mount it as a git submodule. We could even transform the output JSON into its proper lemmy rust types with serde, spitting out a an .rs file, as a pre-compile task. But it'd also probably be fine to transform the json in code, as long as its only done on startup.

The way build.rs works is that it runs during every Lemmy build. In release mode it fetches communities from lemmyverse.net, in debug it uses the hardcoded https://lemmy.ml/c/lemmy. It then writes the list to OUT_DIR which gets included in the final binary. Then while Lemmy is running it loads the same file back from inside the binary. So there is no need to generate a .rs file, it would be more complicated for no reason. Also if the request to lemmyverse.net fails during release build, the entire build fails so we can investigate and fix the problem.

Adding support for community crawling to the existing crawler would require a lot of work, and it would make the crawler use a lot more server resources. I dont have time to implement that, and it doesnt make sense if we can simply take the data from an existing crawler made for this purpose which is also open source. If anything I would consider mirroring the data from lemmyverse.net to join-lemmy.org or including it in the repo. But that would be more complicated and would require some kind of update task. So as long as there are no actual problems we should keep it simple like this.

Actually we dont need lemmyverse.net or any crawler for this, instead we can simply fetch the community list directly from lemmy.ml. Will change it like that shortly. In theory we could even do the same thing for the instance list on join-lemmy.org...

Edit: Done

dessalines · 2025-10-20T20:12:24Z

config/defaults.hjson

+    # Set this to true to start with an empty instance instead.
+    no_default_data: true


I think it'd be better to name this more explicitly: fill_default_federated_communities.

probably separating it from create_welcome_post.

The welcome post directly references the auto-fetched communities. So with separate config variables it would also need a different text in that case. Not worth the effort, but I can change the variable name or expand the comment if needed.

dessalines · 2025-10-20T20:24:52Z

crates/api/api_crud/build.rs

IMO this should be

Extracted into its own git repo, that uses lemmy-client-rs to fetch and write this communities.json file.

We can run this whenever we want, like before a release.

Add this repo as a submodule, and put its git submodule path somewhere inside the crate that needs it. Alternatively you could define the communities.json location as an env var, and include it as a mounted file in the docker-compose.yml, so it can be read.

Read that file when creating the rows.

I dont see what would be the benefit of all this extra complexity. If it requires another task to update then we will forget about that over time, and the list will get outdated. Right now it simply works, and if there is a problem we will notice it and will have time to fix it.

Alright... the main thing that scares me is that it's relying on lemmy.ml being available during the build.

That should usually be the case, if not we can simply restart the build. Or develop a different solution if it really turns out to be necessary.

dessalines · 2025-10-20T20:29:04Z

crates/api/api_crud/src/user/create.rs

+    // Fetch communities themselves
+    let tasks = communities.iter().map(|c| async {
+      let context = context.reset_request_count();
+      c.dereference(&context).await.ok();
+    });


Is this actually HTTP fetching data from maybe hundreds of communities?

Really we should only be creating instance and community rows locally, as this issue is about community discovery. IE instead of hammering these communities with fetches, we should be running Instance::read_or_create and Community::create with the supplied data.

50 communities in total, along with recent posts so that the All tab gets populated on a new instance. I added a check is_new_instance() which reduces the amount of data fetched, and this way it doesnt take so long.

What's the purpose of those http fetches? Don't we only need to fill the community table rows so that the communities are searchable? In that case we only need to do DB inserts.

Not sure what you mean, we still need to get the data to insert from somewhere. Are you talking about embedding all that in the binary? Would be a lot more complicated to implement that way. And this doesnt only insert communities, but also the recent posts to have some initial content.

let communities_json = include_str!(concat!(env!("OUT_DIR"), "/communities.json")); let communities: Vec<ObjectId<ApubCommunity>> = serde_json::from_str(communities_json)?;

You're reading a communities.json file, which called ListCommunities already, and has all the info necessary to fill community DB rows. There's no reason to then fetch data, you already have it.

Fetching initial content for 1000s of communities could be burdensome on a lot of servers, its probably not a good idea, especially since this issue should only be about getting communities.

It only fetches 50 communities (adjusted the text), and only when a new Lemmy server is created which is not that often. Compared to the normal data fetches done by any active Lemmy instance this is not much.

I tried to change it to embed a Vec<CommunityView> in the binary instead. But this would require two separate API requests in build.rs to fetch /c/announcements and /c/lemmy, as well as an API request in debug builds to fetch /c/lemmy for testing. It also would be more complicated to fetch recent posts this way as CommunityView doesnt store the outbox_url. Anyway the fetching is quite fast and works well.

Alright then, I spose its okay.

dessalines · 2025-10-24T17:57:06Z

Merge whenever you like.

thethunderwolf · 2026-01-17T20:32:40Z

I know that this is old but hardcoding the instance doesn't seem very decentralized

especially concerning given the reputation that lemmy.ml has

dessalines · 2026-01-18T02:19:28Z

Already superceded and made configurable by #6276

Nutomic added 3 commits October 14, 2025 16:09

Setup default data for new instance

82e25d9

embed community list in binary (fixes #2951)

67e7fac

wip

0cd143c

dessalines reviewed Oct 14, 2025

View reviewed changes

Nutomic added 3 commits October 15, 2025 10:46

filter nsfw, suspicious

4d0ad2e

fetch less after first start

1d00290

finish welcome post

0a7df45

Nutomic force-pushed the instance-setup-data branch from 0603782 to 0a7df45 Compare October 15, 2025 10:07

Nutomic added 3 commits October 15, 2025 12:11

fix

f5a2810

prefetch more comms

5d608e4

run setup earlier, during initial user signup

5cd00ca

Nutomic marked this pull request as ready for review October 15, 2025 10:55

Nutomic requested review from SleeplessOne1917, dullbananas and phiresky as code owners October 15, 2025 10:55

fix test

7405cc9

Nutomic force-pushed the instance-setup-data branch from c88ad77 to 7405cc9 Compare October 15, 2025 12:10

clippy

92e2390

Nutomic force-pushed the instance-setup-data branch from b904421 to 92e2390 Compare October 15, 2025 13:02

Nutomic added 2 commits October 15, 2025 16:33

clippy

5205e72

fix api test

fb7dd70

Nutomic mentioned this pull request Oct 16, 2025

Default multi-community for new instances #5771

Closed

move text to translations

ad92d95

Nutomic mentioned this pull request Oct 16, 2025

Add text for welcome post LemmyNet/lemmy-translations#211

Merged

fetch data from lemmy.ml instead of lemmyverse.net

be944bc

dessalines reviewed Oct 20, 2025

View reviewed changes

Nutomic added 3 commits October 23, 2025 11:59

Merge branch 'main' into instance-setup-data

dde9ade

Merge branch 'main' into instance-setup-data

fb79f28

adjust comment

52884e7

dessalines approved these changes Oct 24, 2025

View reviewed changes

Nutomic merged commit 8c2303a into main Oct 27, 2025
2 checks passed

Nutomic deleted the instance-setup-data branch October 27, 2025 09:28

Nutomic mentioned this pull request Jan 9, 2026

Make bootstrap instances configurable (fixes #6260) #6276

Merged

		# Set this to true to start with an empty instance instead.
		no_default_data: true

Uh oh!

Conversation

Nutomic commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dessalines Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nutomic Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dessalines commented Oct 24, 2025

Uh oh!

Uh oh!

thethunderwolf commented Jan 17, 2026

Uh oh!

dessalines commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nutomic commented Oct 14, 2025 •

edited

Loading

dessalines Oct 14, 2025 •

edited

Loading

Nutomic Oct 16, 2025 •

edited

Loading