Consistency between Actor icon handling and post image federation #2382
Replies: 4 comments 20 replies
-
Mastodon allows |
Beta Was this translation helpful? Give feedback.
-
|
@pfefferle For example, when displaying the thumbnail column in the WordPress dashboard ( Additionally, I believe the Here’s an example using my Mastodon profile’s avatar and header image. "icon": {
"type": "Image",
"mediaType": "image/jpeg",
"url": "https://files.mastodon.social/accounts/avatars/111/256/504/964/424/929/original/bb8761e4f41b929f.jpg"
},
"image": {
"type": "Image",
"mediaType": "image/jpeg",
"url": "https://files.mastodon.social/accounts/headers/111/256/504/964/424/929/original/f1ebc9e851ed5b25.jpg"
}Relevant References from ActivityStreams Vocabularyhttps://www.w3.org/TR/activitystreams-vocabulary/#dfn-icon icon image |
Beta Was this translation helpful? Give feedback.
-
ActivityPub Remote Actor Data Model and Avatar Caching Architecture Report: WordPress CPT and Asynchronous Synchronization StrategyI. Architectural Overview of Remote Actor Management1.1. Need for a Unified Actor Model in WordPressThe core of the ActivityPub protocol is distributed actors. These actors are not limited to WordPress’s default user model in the wp_users table, but encompass various types such as Person, Group, Organization, Application, and Service.1 WordPress’s WP_User structure mainly assumes a "user" who logs in to a local instance and creates posts, so forcibly assigning remote actors without local write permissions to this table introduces structural inefficiencies and security overhead. Remote actors should be treated as data objects that simply store profile information and interaction endpoints (Inbox, Outbox URI, etc.) in the local database. The official ActivityPub plugin developers have recognized this structural limitation and, in recent updates, adopted a Unified Actor Model using Custom Post Type (CPT) IDs instead of remote user URIs, optimizing Follower tables and other actor-related tables.3 This approach reflects a clear architectural intent to manage all actors (local users and remote actors) internally using a single CPT ID, which is essential for reducing complexity in follow and interaction logic. Implementers should design actor lookup and interaction logic by wrapping local WP_Users as ActivityPub actors and treating remote actors as CPT instances, unifying all objects under a single CPT Post ID. 1.2. Criteria for Storage Choice: Justification for Using Custom Post Types (CPTs)Deciding whether to use CPTs or dedicated custom database tables (DBTs) for storing remote actors is critical in WordPress development. Considering the characteristics of the remote actor model, leveraging the CPT infrastructure provides significant advantages. CPTs allow safe data retrieval through the WP_Query class without writing complex, error-prone SQL queries.5 Additionally, the built-in caching infrastructure of WordPress, including object caching, can be automatically leveraged to improve repeated access performance for remote actor data.5 Simply registering the CPT provides a basic UI in the admin dashboard to filter and view actors, enhancing operational efficiency.3 On the other hand, CPTs are bound to the “post” metaphor, including unnecessary fields like post_title and post_date, while complex actor properties (e.g., inbox URL, publicKey) must be stored in the postmeta table.6 Postmeta stores key-value pairs without indexing by default, potentially causing performance degradation for large-scale lookups by actor ID.6 To circumvent these limitations, architectural optimizations are necessary. Normalizing or hashing the unique URI of the ActivityPub actor1 and storing it in the CPT’s indexed post_name (slug) field allows fast lookups without traversing postmeta, a strategy essential for maintaining performance when hundreds of thousands of remote actors accumulate. II. CPT-Based Remote Actor Data Structure and Schema DefinitionA Custom Post Type for storing remote actors should be clearly defined, e.g., ap_actor.3 The core fields and extended metadata (post meta) of this CPT should faithfully reflect the ActivityStreams 2.0 object model. 2.1. CPT Registration and Core Field Mapping StrategyThe key WordPress fields of the ap_actor CPT are used as follows: post_title stores the actor’s display name, post_content stores the actor’s summary, supporting administrative convenience. Most importantly, post_name stores the hashed or normalized Actor URI, enabling fast actor lookups by unique identifier. post_status stores the actor’s current status (e.g., active, suspended) to support admin table filtering.3 2.2. Detailed Actor Metadata (Post Meta) Schema DesignComplex ActivityPub properties of actors are stored in the wp_postmeta table, allowing flexible schema expansion.7 Key meta keys and purposes for storing remote actors are:
Notably, the ap_raw_json field allows storing the full original JSON-LD from the remote actor.3 ActivityPub can include non-standard extensions beyond ActivityStreams across Fediverse platforms. Saving the entire JSON ensures important data (e.g., new profile endpoints) are retained without schema modification, providing data resilience for future protocol changes. III. Local Avatar Image Caching Implementation3.1. Motivation: Performance, Privacy, and ReliabilityCaching remote actor avatars in the local WordPress Media Library is not just for performance—it is an essential measure for security and privacy in decentralized networks. Directly loading avatars from remote URLs (hotlinking) exposes the IP addresses of local users (followers) to the remote server, violating anonymity principles.10 Local caching downloads the images to the WordPress server, acting as a proxy (server IP only exposed), and serves the local media URL to users, protecting privacy.10 Additionally, caching ensures fast image load times regardless of remote server latency,11 and bypasses hotlink protection measures used by some remote instances.12 3.2. Media Library Sideloading ImplementationWordPress provides media_sideload_image() for downloading external images to the local Media Library, handling file download, validation, and registration in a single call.13,14 Caching Workflow:
require_once( ABSPATH . 'wp-admin/includes/media.php' );
require_once( ABSPATH . 'wp-admin/includes/file.php' );
require_once( ABSPATH . 'wp-admin/includes/image.php' );
$att_id = media_sideload_image( $remote_url, $actor_cpt_id, 'Remote Actor Avatar', 'id' );
Frontend display uses wp_get_attachment_image() on the stored Attachment ID, leveraging WordPress’s image processing and caching pipeline.16 Local Avatar Caching Workflow and Functions
IV. Synchronization and Refresh Strategy using WP-CronRemote actors may update profile info (especially avatars) at any time. Local instances must detect changes and refresh caches. Synchronous processing on user requests is resource-intensive and should be avoided. 4.1. Asynchronous Refresh via WP-CronProfile refresh is a background HTTP request task; WP-Cron is standard for scheduling periodic events.17 Use wp_schedule_event with 'hourly' or 'daily' intervals.19 Because WP-Cron requires page visits, low-traffic environments may delay execution. For high-reliability environments, use system cron to periodically execute wp-cron.php. 4.2. Hybrid Synchronization Strategy (Push vs. Polling)Actor profile refresh should combine ActivityPub Push with Polling:
WP-Cron serves as the fall-back safety net for maintaining data integrity when real-time updates fail. Synchronization Strategy Matrix
V. Development and Operational Considerations5.1. Performance Optimization: Remote RequestsFetching remote actor profiles can generate multiple HTTP requests:
5.2. Security and Scalability5.2.1. Signed Requests and AuthenticationSecure instances may require outbound GET requests signed with the local actor’s private key.21 Implementations must support signing to ensure successful interaction. 5.2.2. Managing ActivityStreams Object Type ExtensionsCurrent focus is on actor objects. Extended objects (comments, likes, reposts) may require separate CPTs (e.g., ap_activity, ap_note) or DB tables for long-term scalability.22 VII. Conclusion and RecommendationsUsing Custom Post Types (CPT) is the optimal approach for reliably storing and managing remote ActivityPub actors in WordPress. CPTs leverage standard APIs, caching infrastructure, and admin UI. Key recommendations:
|
Beta Was this translation helpful? Give feedback.
-
Conditional Media Handling Architecture: Differentiating Upload Paths and Managing Non-Attached ActivityStreams ObjectsThis report outlines an architecture for managing remote ActivityPub media (such as actor avatars and icons) within WordPress, specifically addressing the requirements for: 1) separating cached media storage from standard user uploads, and 2) conditionally bypassing the creation of a WordPress Attachment ID for transient media objects that lack an independent ActivityStreams 2.0 (AS2) identity. I. Semantic Requirements for Media Object IdentityI.A. The Conflict between WP Attachments and AS2 SemanticsIn standard WordPress operation, all files uploaded to the server are registered in the Media Library as an attachment Custom Post Type (CPT), assigned a unique database ID, and placed in the default /wp-content/uploads/YYYY/MM/ directory structure. However, the ActivityPub protocol relies on the ActivityStreams 2.0 data model, where objects can be categorized into two groups based on their identity 1:
Design Conclusion: Imposing a persistent WordPress Attachment ID on an AS2 object that is semantically transient (lacking an id) is inefficient, increases database bloat, and violates the object's original design intent.4 Therefore, the system must implement a conditional persistence strategy based on the AS2 id field. I.B. Conditional Persistence Strategy MappingThe presence of the AS2 id property in the incoming JSON payload should determine the storage fate of the media object:
II. Implementing Differentiated File Storage PathsTo physically separate remote actor caches from standard user-uploaded media (which follow the standard /YYYY/MM/ structure), a custom upload directory is required. II.A. Leveraging the upload_dir FilterWordPress uses the upload_dir filter to determine the final path for all file uploads.6 To achieve path differentiation (e.g., using a structure like wp-content/uploads/federation-media/), this filter must be temporarily hijacked. Critical Implementation Requirement (Isolation): The upload_dir filter must not be applied globally, as this would break standard media uploads. It must be conditionally registered only during the execution of ActivityPub-related media fetching routines and immediately removed to prevent contamination of the global upload settings.7 The recommended method for safe, contextual filtering is to use the wp_handle_upload_prefilter hook to add the custom upload_dir filter, perform the file operation, and then remove the filter before the function returns 7: PHP // 1. Function to apply the custom path } // 2. Execution Wrapper to ensure filter is temporary } III. Bypassing Attachment ID Creation (Database Persistence)For transient AS2 objects (those without an id), the standard high-level WordPress sideloading function, media_handle_sideload(), is unsuitable because it is programmed to automatically call wp_insert_attachment() and return a database ID.9 The solution requires using low-level WordPress file handling functions that perform the file system work without the database insertion step.10 III.A. Low-Level Caching Workflow
IV. Garbage Collection and Lifecycle ManagementA major consequence of bypassing the WordPress Media Library is that the standard deletion mechanisms for media are disabled. When the parent CPT (the remote actor record) is deleted, the cached files are left as "orphaned files" on the server, causing disk bloat.8 IV.A. Custom Cleanup HandlerTo ensure operational integrity, a custom garbage collection routine must be implemented, linked to the lifecycle of the parent CPT post.
This strategy ensures that the local cache file is removed only when its controlling data object (the CPT actor) is destroyed, maintaining synchronization and resource efficiency. PHP // Example: Garbage Collection Hook function ap_cleanup_cached_media_on_actor_delete( $post_id, $post ) { } V. Summary of Recommended ImplementationThe most robust and efficient approach for managing remote ActivityPub media in WordPress is through a selective, low-level file management pipeline that respects AS2 semantics:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If the
iconproperty of an [Actor Object](https://www.w3.org/TR/activitypub/#actor-objects) supports ImageObject and arrays, then it could work similarly to Facebook profiles — where each user can leave comments with their profile image attached.Now that I think about it, it’s quite strange that the avatar (
icon) source for a WordPress Actor is Gravatar. That’s an external source — and as far as I know, external sources aren’t supposed to be used in federation for trust and reliability reasons.If that’s acceptable, then technically there’s no reason why hotlinked images included in a post couldn’t also be federated, right?
Also, I just got curious — when deleting a user in WordPress, there’s an option to reassign all content to another user.
I haven’t tested this, but what happens to the outbox items or Activity objects in that case?
I just wanted to leave a comment here since the recent PRs have been focused on remote actor avatar handling and ActivityPub media processing, and I’ve also been dealing with some account-related database issues on WordPress.com lately.
By the way, I just saw this PR: [mastodon/mastodon#36322](mastodon/mastodon#36322).
It seems Mastodon now allows converting a Note into an Article via an Update activity, doesn’t it?
Beta Was this translation helpful? Give feedback.
All reactions