Add initial ActivityPub support to starter packs #177

jfietkau · 2025-07-30T12:23:16Z

Building on #175, this pull request adds ActivityPub support to starter packs. This is not intended to be ready to merge right away, there are several spots in here that I want to discuss. Plus it'll only make sense to actually run in practice once the ActivityPub backfilling from the post-merge discussion in #175 is in place. But we can already start talking about it. And if you want to merge it early, it does gracefully handle missing ActivityPub IDs, so the only hard depencency is the stuff from #175 that's already merged.

About the design of what's here:

I took a "keep it simple" approach under the assumption that there is currently nothing else in Fedidevs that we want to add ActivityPub support to. So the changes here are all contained in starter_packs/views.py even though the wants_activitypub function, which detects HTTP requests from ActivityPub platforms, is fit for the general case. If we ever do more ActivityPub in various places, we should find a better spot for it. Until then I reckon it makes sense to keep it all close by.

In this implementation, we use content negotiation to serve ActivityPub under the same URL as the HTML version. I've read some Django docs saying that the view is the correct place to do this. There are other options besides content negotiation, such as Link HTTP headers or HTML elements. There is a draft document by the SWICG that gives an overview. But again, I believe this is the simplest approach, and also coincidentally the one with the best compatibility in the ecosystem.

Because we serve only a single ActivityPub schema, it makes more sense to construct it directly from a dict than to pull in dependencies related to JSON-LD. One last time: keeping it simple.

The schema itself is the one from #pixelfed/starter-kits/issues/1 with some of my own suggested additions. But we should be compatible with it as well as with the ActivityStreams base vocabulary.

I'm going to try to add a couple of line comments to highlight spots where I wasn't 100% sure.

for more information, see https://pre-commit.ci

jfietkau · 2025-07-30T12:51:19Z

Oh yeah, and I forgot to mention: I'm kinda 50/50 on whether this can serve as a general API (for people who don't care about ActivityPub) or if a separate JSON API should exist to list starter packs, like you worked on in the separate branch.

This version is simple to fetch and read, assuming the client can set a custom Accept header. I could still add an optional format=activitystreams GET parameter to this to override the content negotiation and make it easier to grab for clients that can't do their own Accept header.

One downside of this schema is that the ActivityPub IDs are all you get. If the client wants to convert those back into account handles, that's another HTTP round trip each. Unfortunately there really isn't a good way to add the handles to this payload without massively increasing the schema's complexity. So maybe that means you ultimately want a separate non-ActivityPub API after all? Maybe we should wait and see what API consumers say when this is live.

anze3db

Looks good so far but I left a few comments. It would also be great if we had some tests for the wats_activitypub branch!

anze3db · 2025-07-31T08:05:27Z

starter_packs/views.py

+            # This should be resolved somewhere else by a background task, so we
+            # return an HTTP 500 error and leave it to the client to try again later.
+            return HttpResponseServerError()
+        items = [account.activitypub_id for account in accounts]


Shouldn't items be dicts like this:

{ "type": "Person", "id": "https://example.org/users/username", "name": "Display Name" }

Having them as dict has the benefit that we can add new properties without making a backwards incompatible change. If we go with list of strings it won't be easy to change in the future.

ActivityPub uses JSON-LD, so serving the URI https://example.org/users/username or (some subset including id of the properties of) the object hosted at https://example.org/users/username is equivalent. Referencing the ID is just more convenient for us for two reasons:

Since we're not the authoritative source for https://example.org/users/username (that would be the server at example.org), we can't really add or change properties here. They'd have to be synced to whatever exists at https://example.org/users/username. Outdated cached values are not rare, but intentional deviations would be invalid, and a JSON-LD parser would likely strip them out.

If we rehost https://example.org/users/username's info here, we need to keep it stored and updated. For example, we don't currently store the type. We could add it to the database like we did with the ID, but unlike the ID it can change at any time, for example if a Mastodon user checks or unchecks the "this account is a bot" toggle. That sounds like effort.

If we say "screw the rules" and add our own info to our local version of https://example.org/users/username, we'd still need to stay within the ActivityStreams vocabulary, or add our own resolvable JSON-LD @context which sounds like even more of a pain.

I see only downsides.

I'm not very familiar with ActivityPub or JSON-LD but in my experience creating an API endpoint with a list of strings is a bad idea since it's impossible to extend without creating a breaking change.

Also, I'm not sure how useful this endpoint will be if every consumer has to then also refetch all the related objects. Feels wasteful.

I'm already refreshing account data somewhat regularly and stale data hasn't been a big issue so far.

My preference would be to use a list of dicts, even if we start with:

{ "id": "https://example.org/users/username" }

Just so that we keep the door open for future extension if it will ever be needed.

I don't feel very strongly about this though so if you believe that having a list of strings is better then let's go with that.

I'm gonna try my best to set my implementation biases aside and answer from a perspective that mainly takes ActivityPub peers and API clients into account. I think it'll be useful for us to get a bit more shared clarity on this.

I'm not very familiar with ActivityPub or JSON-LD but in my experience creating an API endpoint with a list of strings is a bad idea since it's impossible to extend without creating a breaking change.

I don't know if "a list of strings" is a useful way to discuss this approach. In JSON-LD, the ID string and the object are quite literally interchangeable, so any consumer of this data needs to accomodate both versions anyway, including switching back and forth between IDs and inlined objects within the same list. It's almost pointless to discuss which one is more convenient for consumers because they need to be able to handle both, regardless of whether they're using a full JSON-LD parser (Fedify, Iceschrimp) or not (Mastodon, Pixelfed). That is to say, it would not be a breaking change to switch between IDs and inlined objects at a later date if we so choose. In fact, it would be a huge mistake for us to guarantee a particular shape of the data structure here.

I'm a tiny bit tempted to randomly replace one in ten IDs with its inlined object counterpart, just to keep ActivityPub consumers on their toes. But I'm not enough of a troll to do that in production software. 🙃

And regarding extensibility, we have very little room to add anything to the contained objects anyway. Because they're owned by other servers, all that we could theoretically add to them would be their properties as they exist on the source server. It's not that I can't see that being useful (e.g. to efficiently compile a list of avatar URLs directly from a starter pack), but I get the impression that you have a perspective on API extensibility here that doesn't map to ActivityPub.

Also, I'm not sure how useful this endpoint will be if every consumer has to then also refetch all the related objects. Feels wasteful.

If they want to do anything beyond trivial (e.g. check how many entries are in the list) with the starter pack, they need to fetch every listed object from the source servers anyway, since – setting unintentional inaccuracies caused by caching aside – they can't necessarily trust our list to be non-malicious. Imagine if a starter pack hosted on fedidevs.com were allowed to provide an object representation of https://example.org/users/abc and the consuming server (say, Pixelfed) just took that at face value. Any starter pack out there would be able to set your name, bio, avatar, etc. to whatever it wants. Since I started using Mastodon, it has had a couple of emergency security updates related to spots where they forgot to validate inlined objects from federated ActivityStreams data against the source instance.

The origin-based data ownership model is documented in FEP-fe34. Not a must-read, just some further info I'm linking in case you're curious.

I'm already refreshing account data somewhat regularly and stale data hasn't been a big issue so far.

That's good to know! 👍 I'll scratch that concern from my mental to do list.

With all this in mind, I'm leaning more and more towards the idea that a separate JSON API for consumers who don't care about ActivityPub may very well make sense. But that's outside the scope of this PR.

If the two formats are interchangeable then let's go with the object one. If we end up adding extra metadata into the object for non ActivityPub clients then that metadata will be ignored by ActivityPub clients since they'll be refetching from source so no harm done, no?

Thank you! Your stance matches my gut feeling.

Having a wrapper type for entries would break compatibility with Dan's schema (pixelfed/starter-kits/issues/1), which is AFAIK implemented by Pixelfed and WordPress ActivityPub. So I don't think that's something we want to do here.

We could plop our metadata into a separate property of the root Collection, which we do own. Although at that point it becomes a little dubious again whether consumers of AS and those wanting plain JSON should really share a payload or whether we should just split them entirely.

Fun times 😆

What's the type value supposed to contain and where can we get this data?

We could plop our metadata into a separate property of the root Collection, which we do own.

@jfietkau Another option is to offer two different collections: Pixelfed / legacy collection and your own collection that can contain additional metadata (sounds like an important feature!).

The value of type can in theory be pretty much whatever, although in the wild I've only seen these five. It differs per account and can change over time.

Unlike the ID, the type is sadly not in the Mastodon API account lookup that we already do. The process to acquire it would be to fetch the ActivityPub ID as a URL with the correct Accept header to get the ActivityPub representation. The type field is in there and can be stored as a string.

Or we split the ActivityPub and JSON APIs. In that case we could put Mastodon's bot boolean that we already have in the database into the JSON result. I think I'd prefer to move this PR in that direction rather than abuse the ActivityPub objects too much. I'll whip something up for you to look at tomorrow.

@silverpill True, that would be possible too. The new collection would need a separate ID/URI though, which is a bigger change in this implementation context than I wanted to introduce. Will mull it over.

("Pixelfed / legacy collection" – shots fired 😆)

anze3db · 2025-07-31T08:15:43Z

starter_packs/views.py

+            "id": request.build_absolute_uri("/s/" + starter_pack.slug + "/"),
+            "name": re.sub(r":\w+:", "", starter_pack.title).strip(),
+            "summary": starter_pack.description,
+            "attributedTo": author.activitypub_id,


Same comment as above and ties into your proposal. Should attributedTo also be a dict?

Same reply as above. I don't think we'd gain anything from it.

anze3db · 2025-07-31T09:39:03Z

starter_packs/views.py

+        if author.activitypub_id is None:
+            # Owner of this starter pack does not have an ActivityPub ID stored yet.
+            # This should be resolved somewhere else by a background task, so we
+            # return an HTTP 500 error and leave it to the client to try again later.
+            return HttpResponseServerError()


I've created the script to populate activitypub_id but and it's now running (will probably take more than a day to go over all accounts).

What I'm seeing is that there are still going to be a lot of accounts where activitypub_id won't be set because a lot of webfinger requests are returning 410, 404 or 403 responses.

Not returning anything when fingerprint id isn't set might not be the best. Instead maybe we can still return results, especially if items and attributedTo are dicts and we can populate the name field...

Accounts without working WebFinger resolution should be a niche case. They are valid ActivityPub, but they can't interoperate with Mastodon. Occasionally someone misconfigures their server in some way (and I will say that my WebFinger resolution function from #175 doesn't take host-meta into account, which probably affects a few servers), but the more likely explanation especially for the 410 errors is that the accounts don't exist anymore. Could you send me a list of a few non-working accounts? I'll see what I need to do to update the script.

We absolutely, 100%, cannot return anything about the account if we don't have the ID. We could at best leave out the attributedTo field altogether for that case, but that seems like a data integrity issue. Same for a partial list in items.

I have updated my WebFinger backfilling script in #175 to also handle alternative WebFinger domains from host-meta information. Should be a fringe configuration, but I'm aware that a few of them exist in the wild. Differences are only in the body of the get_activitypub_id_from_webfinger function, but it changes how the WebFinger URL is constructed, so I had to adjust more than just the first few lines.

After running the original script 5% of all accounts don't have activitypub_id.

Since the vast majority of these are 410 responses I think I can assume that these accounts are no longer active and I can remove them from the database.

There is a handful that received a 403 response (mostly from awscommunity.social).

Here is a small sample with different error types:

[email protected] Error: status code 403 for WebFinger data [email protected] Error: status code 403 for WebFinger data [email protected] Error: status code 403 for WebFinger data [email protected] Error: Failed to fetch WebFinger data [email protected] Error: Failed to fetch WebFinger data [email protected] Error: status code 410 for WebFinger data [email protected] Error: status code 410 for WebFinger data [email protected] Error: status code 410 for WebFinger data [email protected] Error: status code 410 for WebFinger data

Yeah, awscommunity.social is gone. It used to be a Mastodon instance I think, but I remember reading about them shutting down.
emacs.ch is also down, but I don't know if that one's temporary.
The other four, with the 410 errors, are deleted accounts from working instances. It's worth noting that their profile URLs all return 410 as well, so they're possible to detect.

I can't find the spot where we do scheduled account updates. Do routine updates use accounts/management/commands/crawler.py? If so, I guess we might not be detecting deleted accounts at all yet. A 410 error is a very safe bet that the account can be removed from the database. A 404 should probably only be taken as such if it lasts for a long time – accidental 404s could pop up during certain kinds of server maintenance.

It could be worthwhile to check https://{instance}/health from time to time for each server and remove (or hide?) these instances and their accounts after prolonged downtime. I believe Mastodon gives up on activity delivery after two weeks.

I'd like to not make all that a prerequisite for this PR though. It's worth investigating separately, but for the purpose of ActivityPub delivery, I think the current way the code handles missing IDs is the correct one.

Do routine updates use accounts/management/commands/crawler.py? If so, I guess we might not be detecting deleted accounts at all yet.

That's it, yes! We aren't detecting deletions at all. So yeas, all those 410 can safely be deleted.

It could be worthwhile to check https://{instance}/health from time to time for each server and remove (or hide?) these instances and their accounts after prolonged downtime. I believe Mastodon gives up on activity delivery after two weeks.

Good idea!

None of these are prerequisite for this PR. All good 👍

jfietkau · 2025-07-31T10:19:35Z

Thanks! I'll take a look at the tests later today.

starter_packs/views.py

jfietkau · 2025-07-31T10:20:43Z

starter_packs/views.py

+            # This should be resolved somewhere else by a background task, so we
+            # return an HTTP 500 error and leave it to the client to try again later.
+            return HttpResponseServerError()
+        items = [account.activitypub_id for account in accounts]


ActivityPub uses JSON-LD, so serving the URI https://example.org/users/username or (some subset including id of the properties of) the object hosted at https://example.org/users/username is equivalent. Referencing the ID is just more convenient for us for two reasons:

Since we're not the authoritative source for https://example.org/users/username (that would be the server at example.org), we can't really add or change properties here. They'd have to be synced to whatever exists at https://example.org/users/username. Outdated cached values are not rare, but intentional deviations would be invalid, and a JSON-LD parser would likely strip them out.

If we rehost https://example.org/users/username's info here, we need to keep it stored and updated. For example, we don't currently store the type. We could add it to the database like we did with the ID, but unlike the ID it can change at any time, for example if a Mastodon user checks or unchecks the "this account is a bot" toggle. That sounds like effort.

If we say "screw the rules" and add our own info to our local version of https://example.org/users/username, we'd still need to stay within the ActivityStreams vocabulary, or add our own resolvable JSON-LD @context which sounds like even more of a pain.

I see only downsides.

jfietkau · 2025-07-31T10:21:15Z

starter_packs/views.py

+            "id": request.build_absolute_uri("/s/" + starter_pack.slug + "/"),
+            "name": re.sub(r":\w+:", "", starter_pack.title).strip(),
+            "summary": starter_pack.description,
+            "attributedTo": author.activitypub_id,


Same reply as above. I don't think we'd gain anything from it.

jfietkau · 2025-07-31T10:26:00Z

starter_packs/views.py

+        if author.activitypub_id is None:
+            # Owner of this starter pack does not have an ActivityPub ID stored yet.
+            # This should be resolved somewhere else by a background task, so we
+            # return an HTTP 500 error and leave it to the client to try again later.
+            return HttpResponseServerError()


Accounts without working WebFinger resolution should be a niche case. They are valid ActivityPub, but they can't interoperate with Mastodon. Occasionally someone misconfigures their server in some way (and I will say that my WebFinger resolution function from #175 doesn't take host-meta into account, which probably affects a few servers), but the more likely explanation especially for the 410 errors is that the accounts don't exist anymore. Could you send me a list of a few non-working accounts? I'll see what I need to do to update the script.

We absolutely, 100%, cannot return anything about the account if we don't have the ID. We could at best leave out the attributedTo field altogether for that case, but that seems like a data integrity issue. Same for a partial list in items.

jfietkau · 2025-07-31T14:58:55Z

starter_packs/views.py

+        if author.activitypub_id is None:
+            # Owner of this starter pack does not have an ActivityPub ID stored yet.
+            # This should be resolved somewhere else by a background task, so we
+            # return an HTTP 500 error and leave it to the client to try again later.
+            return HttpResponseServerError()


I have updated my WebFinger backfilling script in #175 to also handle alternative WebFinger domains from host-meta information. Should be a fringe configuration, but I'm aware that a few of them exist in the wild. Differences are only in the body of the get_activitypub_id_from_webfinger function, but it changes how the WebFinger URL is constructed, so I had to adjust more than just the first few lines.

Co-authored-by: Anže Pečar <[email protected]>

for more information, see https://pre-commit.ci

… string representation Co-authored-by: Anže Pečar <[email protected]>

for more information, see https://pre-commit.ci

jfietkau · 2025-07-31T22:29:11Z

I have added a few tests, for the HTTP content negotation and for the JSON-LD schema. And what do you know, they uncovered a few improvements to how I'm doing the content negotation and have made it more resilient in edge cases.

The only problem is with the test setup requirements that the user creating the starter pack must have a full account, and that all accounts (the creator and the five "inhabitants") must all have ActivityPub IDs. I keep getting this error:

accounts.models.Account.DoesNotExist: Account matching query does not exist.

So the setup for the ActivityPub test is still flat-out missing right now. I put some time into trying to learn Model Bakery to make it happen, but didn't feel like I was making much progress. Is this something you could knock out in a few minutes or should I try again tomorrow?

But hey, if I patch the requirements for a full author account and for ActivityPub IDs out of the view code, the tests succeed. So that's something.

anze3db · 2025-08-01T23:20:22Z

But hey, if I patch the requirements for a full author account and for ActivityPub IDs out of the view code, the tests succeed. So that's something.

Great! We can definitively make the test have all the necessary data! I'll take a look at this PR tomorrow, it's getting late here so I need to go to bed now 😅

for more information, see https://pre-commit.ci

jfietkau · 2025-08-02T02:44:19Z

Okay, a new revision of this PR is in. I have taken a shot at separating the JSON and ActivityPub APIs. There's arguably a little bit of duplication, but while putting this together I got the feeling that the results flow much more naturally than trying to cram a general-use JSON API into the ActivityPub format. Let me know what you think.

I'm not fussy about what exactly should be in the JSON result. The way I did it now was to pick a few handfuls of model properties that seemed to make sense to expose, but I'm fairly dispassionate about the JSON API. I'm probably not going to be among its users. At this stage, this PR is more about the general structure of the code and the API separation.

While I was at it, I added the ability to request a specific format via a GET parameter, so you can do /s/slug/?format=json (or html or activitypub) to look at the differences. It's a bit handier for me too than having to fiddle with the Accept header each time, so I wanted to get this done anyway.

I've also updated the tests. They're still missing their setup, but the JSON API has preliminary validation.

anze3db · 2025-08-02T07:36:39Z

Whoops. I wanted to push my fixes to your branch but I ended up pushing to the main branch instead 😅

Well the good news is that the changes are now live and you can test them out on fedidevs.com (example: https://fedidevs.com/s/NjA3/?format=activitypub)! If you still need to make changes to the endpoint then feel free to open a new PR.

jfietkau · 2025-08-02T10:55:29Z

Thanks for merging (and cleaning up) @anze3db! 🙂 Looks all good from my vantage.

I'll take partial responsibility for the main branch thing. For #178 I was working on a separate branch (by necessity, since this one was ongoing at the same time), so that may have inadvertently set some expectations. Glad you got everything sorted out quickly anyway!

Before I go around making noise about the new ActivityPub support, I'd like to add per-pack splash images. I'll open an issue for that.

Julian Fietkau and others added 2 commits July 30, 2025 14:01

Add initial ActivityPub support to starter packs

2a84db5

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6808ac

for more information, see https://pre-commit.ci

anze3db reviewed Jul 31, 2025

View reviewed changes

jfietkau mentioned this pull request Jul 31, 2025

Feature Request: Support for importing non-inlined starter kits Automattic/wordpress-activitypub#2037

Closed

jfietkau commented Jul 31, 2025

View reviewed changes

jfietkau and others added 7 commits July 31, 2025 20:37

ActivityPub starter packs: Cleaner construction of IDs

cf41161

Co-authored-by: Anže Pečar <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9878695

for more information, see https://pre-commit.ci

ActivityPub starter packs: Reference author by username instead of by…

a6ea8d7

… string representation Co-authored-by: Anže Pečar <[email protected]>

Merge branch 'anze3db:main' into main

896469f

Improve ActivityPub content negotiation for starter packs

d6dba58

Add starter packs ActivityPub tests (still missing setup)

7af6e3a

[pre-commit.ci] auto fixes from pre-commit.com hooks

84f6718

for more information, see https://pre-commit.ci

Julian Fietkau and others added 2 commits August 2, 2025 04:33

Separate JSON and ActivityPub APIs for starter packs

c94891e

[pre-commit.ci] auto fixes from pre-commit.com hooks

8763948

for more information, see https://pre-commit.ci

anze3db added 5 commits August 2, 2025 07:24

Add activitypub_id to tests

09f1f24

Improve fetching created_by users account

1c8aeec

Add ddt

489ef37

Make tests more explicit

0c121cc

Cleanup view function

11c7b4c

anze3db merged commit 7b20a80 into anze3db:main Aug 2, 2025
1 check failed

jfietkau mentioned this pull request Aug 11, 2025

Expose the list of accounts for a given starter pack (API?) #160

Open

Add initial ActivityPub support to starter packs #177

Add initial ActivityPub support to starter packs #177

Uh oh!

Conversation

jfietkau commented Jul 30, 2025

Uh oh!

jfietkau commented Jul 30, 2025

Uh oh!

anze3db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfietkau commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfietkau commented Jul 31, 2025

Uh oh!

anze3db commented Aug 1, 2025

Uh oh!

jfietkau commented Aug 2, 2025

Uh oh!

Uh oh!

anze3db commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfietkau commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

anze3db commented Aug 2, 2025 •

edited

Loading