Remove HTML code from ActivityPub federated posts

Ok, this is a potential corner case but maybe still interesting.

Posts imported with rss-polling contain HTML code. If these posts are federated, most of the HTML code breaks and goes through as plain text.

In an ideal world, at least links would be translated. But if this is too much hassle, at the very least it would be good to scrap the HTML code, leaving the text.

A couple of screenshots to illustrate the problem:

Discourse topic imported via RSS:

This is how it looks on Mastodon:

Why are we federating posts imported via RSS? The use case is: we are a community of podcasts, we import new episodes via RSS for listeners to like and comment in one place, and we want to offer these podcasts a window to the Fediverse, where they might get more listeners, comments and likes – without adding more work on their busy plates.

1 Like

Just for your context, when ActivityPub content is federated (from any platform) it is typically federated as HTML. We will alway federate Discourse posts as HTML by default.

What you probably want is a custom filter for ActivityPub content. We may add that at some point soonish, however this is a relatively specific use case, and it’s not at the top of the priority list.

1 Like

I understand. Do you think this is something we could try to push via Marketplace? It is a specific use case but it impacts us fully.

It can’t hurt to try posting a request in Marketplace!

Looking at the screenshot they posted it still seems like there's almost certainly some kind of bug here, although I can't tell if it's on the Mastodon side or the Discourse side.

Even the most complex HTML should just turn into plain-text when Mastodon parses it. Not whatever broken HTML markup is going on here.

Also, when I view this thread from https://socialhub.activitypub.rocks/t/remove-html-code-from-activitypub-federated-posts/5293, why are the images missing?

Unfortunately, I can't view the topic itself in ActivityPub to determine who's at fault here, the server returns 406 Not Acceptable:

curl -H 'Accept: application/activity+json' https://red.podkasts.org/t/el-canto-de-la-tripulacion-n-36-nuevas-voces/23408/1

And looking up the object in Mastodon doesn't return a result either.