Synchronising/Crossposting topics across different Discourse sites

I currently have a topic one one site, which I would like to also have on another site. Whilst I could hyperlink it, what I would really love is the ability to edit it on either site, and have the changes reflected on both. This saves me from having one version of the topic which is vastly out of date, and instead keeps both topics constantly up to date. It also provides a way for me to decentralise the information from my site.

Some thoughts on functionality

  • As a starting point, one site would remain as host/owner of the topic, and the other(s) would essentially mirror it. Going further, I even wonder if a topic could somehow be inherited by a mirror topic if the original is deleted.
  • The topic should retain the ability to be hidden, closed etc on the mirror sites
  • The replies should not be synchronised – each site has a different userbase, so I can’t see how synchronising replies would work

I understand the implementation of such a feature is very far from trivial, but I’m wondering if this has ever been looked into, and what results/experiments already exist?

2 Likes

Having duplicate content on multiple sites is a big SEO no-no. This is not likely to be supported.

What is the use-case or purpose? You basically want the other sites to be a backup of the primay one?

2 Likes

Could you give a bit more information on this? From my perspective this would recuce the duplicate content.

Backup is not the objective, rather to create a single point of truth for topics which would make sense on more than one site.

To give a concrete example, I’m looking at changing my Visa. I have a Topic in my private Discourse which is basically a checklist of what needs to be done. However, my friends may also find this useful so I create the same topic on our shared Discourse. The problem is, that I need to synch the information of both topics separately instead of just updating a single topic. This means I often have one of the topics missing key information.

I guess this might even be possible with just an api key to the other site? Perhaps something like a button/section in the editor where a list of api keys and urls for the target topic can be created. When you make changes to a source topic, you can click something like “push changes to clones of this topic”. All this would do is push the update to the topics on other instances.

1 Like

Put the information in exactly one place where everyone can see it. A link is how to do that.

But you have secret information that you want to make available somewhere else. That’s something else. It’s quite possible with a plugin. It’s the kind of problem where the contrived solution requires 10X the work that the actual problem would require. (I often spend hours automating a task that needs to happen only once, for example.)

But you’d need to put it somewhere that it would be available only to the user that posted it. Or make it site-wide?

Again they need to be per-user and in the serializer just for the current-user (or maybe store it in the user profile?). And you’d need some data structure to map API keys to the different sites. Seems like something I’d think I could do in 2-5 hours.

So you somewhere need to store the URL for the other sites that are supposed to have this topic. How to create that post could be complicated too; the easiest way is to just create it by hand and include the URL of that topic in the source site. You could probably store that in the raw post in some kind of BBcode or something like that. That would let you create a component that would create the button and link for each of them and then you’d have Rails code that would queue a job that would try to post it to the other site(s). But the receiving sites wouldn’t need any code–you could use the api to push an edit to the post.

Seems like the kind of thing that I’d think I could do in 5-10 hours but would likely take twice that. If that’s fun for you, then it could be a cool project.

4 Likes

This could also be done through an external little app which would listen for a webhook sent on edits on your source site, and then use the api to post on the mirror site.

4 Likes

I’ve been thinking again about this.

A plugin could add a custom topic field for the URL of the source of the primary document. (I guess it would also need fields for a remote username and api key if the main document is to be hidden, as I think is your use case, but that piece could wait. Or perhaps they could live in a user custom field. It would be up to whoever generated the key to see that the api key has read-only privileges).

When creating a topic you would enter something like "remote: https://meta.discourse.org/t/synchronising-crossposting-topics-across-different-discourse-sites/263269" and when the topic was created, Discourse would pull the remote topic’s raw text, insert it into raw as an edit and instantiate the topic_custom_field with the remote URL, perhaps adding a “copied from url” at the top.

At this point you’ve copied the remote topic locally and have a record of it.

There could then be a “check source” button that would pull the remote topic and save the remote topics’ updated_at and maybe even the raw in other custom fields (a job could also do this periodically, saving a bit of UX). You could then have a an update button that would replace the existing raw with the remote one as an edit.

If the primary site is public, then this part is really easy. Adding API an API key to pull from a private site complicates things, managing a set of API keys across multiple sites, complicates it more. If the original source needed to be replaced you could maybe do that with the remap rake task, or add the ability to edit the custom field with the remote URL when you had need to.

This part comes for free, since this solution has the secondary sites pulling the data from the primary one.

Right. And there can be a link back to the source site, so people could go to the source to see those comments, or perhaps even embed them via Embed comments from Discourse in your single page app.

If you’ve got any budget at all for this, feel free to contact me.

1 Like

Hey Jay,

I’ve had this on my mind for a while, though it hasn’t progressed much until recently. Unfortunately the Discourse to Discourse syncing dropped in value for me(it was only a handful of topics), but what raised in value was the desire to synchronise markdown files from other platforms in general.

Our dominant use case at the company was to have readmes and wikis from Gitlab projects available within Discourse(for feedback and searching), but with the gitlab file remaining the single source of truth. My lack of ruby knowledge resulted in a python script which is definitely overkill in its implementation, but satisfactory in its functionality. The first decent version does some of what you outlined as well. Some functionality:

  • contains link to the original source(file in gitlab)
  • contains link to the specific revision(gitlab commit #)
  • handles images and image urls
    • downloads images from gitlab repo
    • uploads them to discourse
    • replaces original image url with the shorturl of the upload
  • adds a tag “synced_with_gitlab” so it’s easy to find them all

It worked more or less the same with github as well. Both have more or less the same flavour of markdown making it pretty slick.

I would love to make this open source, but I’ll need to see what legal say. Additionally, it’s still a bit of a hacky python mess. The intention is to convert this into a ruby plugin at some point, but I’ll need to see if I can find the necessary time.

3 Likes

True. I’m having another go at this project, and currently looking at making a DB with:

  1. One table per platform(Discourse table, Gitlab table etc) to account for possible nuances
  2. Each platform table supporting webhooks and polling via api key
  3. DB or api key encryption – currently I’m thinking its best to encrypt the whole DB, and interface with it via the script and a passphrase

Discourse Table would look like this:

Type Interval(mins) Last run(mins) src_domain src_post_ID src_usr src_key tgt_domain tgt_post_id tgt_usr tgt_key
Webhook - 120 meta.discourse.com 1280952 - - discourse.mysite.com 120 Tris xyz12345
Polling 60 40 meta.discourse.com 1280953 Tris20 12345xyz discourse.mysite.com 121 Tris xyz12345
Polling 60 35 meta.discourse.com 1750968 Tris20 12345xyz discourse.mysite.com 221 Tris xyz12345
Polling 60 40 meta.discourse.com 1123292 Tris20 12345xyz discourse.mysite.com 131 Tris xyz12345
Webhook - 4800 meta.discourse.com 1283678 - - discourse.mysite.com 129 Tris xyz12345

Does this sound crazy and over-engineered? Part of me wants to build a proper solution for this, which would mean accounting for both possibilities - webhook and polling.

Also, I would be very appreciative of suggestions in terms of keeping the content of the DB secure. Current thinking is to encrypt the db with a passphrase which must be given as an argument when starting the srcipt, e.g.

discourse-sync run password123

Just as a bit of inspiration, the webhook can be super snappy:


This was two topics on two difference instances. Topic in the left is the source topic, topic on the right is the target

There is a rails way to encrypt a single field. That’s what I did in my dashboard.

See Active Record Encryption — Ruby on Rails Guides

Having both polling and webhooks seems redundant. I think I’d chose one approach.

1 Like

I think the ActivityPub plugin is getting some Discourse-to-Discourse federation in the near future if that’s also a possibility?

3 Likes

For what it’s worth, this idea has occurred to me a number of times as well. I’m not sure we have a dedicated topic about this here yet, but if not, we should!

1 Like

Agree theoretically, unfortunately the corporate world complicates things:

  1. webhooks aren’t possible at the company due to IT policies (we’re hosted outside of the company by CDCK + can’t allow portforwarding to the outside world without heavy process) - therefore api version is a must
  2. Webhooks are snappy, lovely, and perfectly reasonable for everyone else, so it makes sense to accommodate them too :slight_smile:

So I achieved a rough version of this a while ago, works a charm but not extensible. Bad design on my part: I was exploring the idea of using a topic with a markdown table as input. Great until you have 30+ entries, then it’s a mess.

Part of the discourse to discourse use case I see is: single point of truth for documentation(Meta) synchronises to respective posts in other instances. This means if the team changes Core, and updates the user documentation Meta, I have an up to date version of that documentation in my instance natively for all users to find.

For V2 I’m planning an SQLlite database as input as above, and probably writing in Rust this time instead of python

Below is a rough sketch.

graph TB
    A[terminal] -- ~>discourse-sync run $PASSWORD --> B[Rust Script]
    B -- SQLCipher:Decrypt DB using Password --> C[( sqlite: sources-and-targets')]
    C -- 'Discourse' Table Data --> B
    B -.-> D{Decision on Type}
    D -- Webhook --> E[Listen for Webhook Info]
    D -- Polling --> F[Polling API]
    E --> G[Receive New Information]
    F --> G
    G --> H[Parse and Process Data]
    H --> I[POST\n tgt_domain, tgt_usr, tgt_key, post_id]
    I --'raw' and images--> J[ Target Post ]

    subgraph Rust Script Operations
    B
    D
E
F
G
H
I
    end

Would be very happy to receive feedback and suggestions on this :slightly_smiling_face:

1 Like

@angus probably of interest to you , I think we already solve a lot of this with the activity pub plugin

6 Likes

Yes, this is now indeed supported by the ActivityPub plugin. We’re very close to using it internally to sync documentation between meta and an internal instance, it’s on my todo list for next week in fact.

7 Likes

Does this apply to both instances? e.g. Source is public, but Target is private, would this still work?

As good as the activitypub plugin is looking, I fear it may not cater for private instances.

It does apply to private instances, in the current version of the AP plugin, private instances can follow categories in public Discourse instances and therefore receive published activities from those instances. (But content in the private instance doesn’t get published, so it’s a one-way sync, from public to private only.)

3 Likes

So it sounds like ActivityPub can perform in the following way:

Public → Public :white_check_mark:
Public → Private :white_check_mark:

Private → Public :x:
Private → Private :x:

This is certainly useful in many cases, such as broadcasting documentation from Meta. Unfortunately this doesn’t yet meet one of my use cases which is to post from a private instance to another private instance. From digging around, am I correct in thinking the ActivityPub Plugin is unlikely to support those use cases in the future? It looks to me like ActivityPub was designed with Public to Public in mind.

2 Likes

@Tris20 Interesting! Thanks for sharing your thoughts, and some detail, on this.

What you’ve described is (one of) the problem(s) ActivityPub was built to solve. I don’t want to dampen your spirits too much, but to be honest, you’re going to be facing a wide array of challenges trying to accomplish this in the way you’re describing. I won’t give you an exhaustive accounting of every challenge you’ll need to overcome, but to give you a sense, the ActivityPub plugin already has almost 700 rspec tests and after a year of development it’s only just recently supported full topic to topic sync.

There’s no inherent limitation that I can see on supporting Private to Public and Private to Private publishing via ActivityPub. The question is one of ensuring that access and security concerns are met when working with private instances.

If I could make a suggestion. Perhaps think about how you can build on the work the ActivityPub plugin (and ActivityPub as a standard) has already done in this respect. There is in fact a solution to the problem of Discourse to Discourse content synchronisation which works currently. It doesn’t cover your use case yet, but it solves the majority of the issues you’ll need to solve in order to meet your needs.

Perhaps you could have a think about how a private to private synchronisation might work in the plugin, i.e. how the access and security questions might be addressed? Then perhaps you and I could even work on a PR together to add it as a feature. Maybe you’ll reach a point where you feel it really isn’t possible to achieve what you want to achieve in the context of the plugin (or ActivityPub as a standard), but the work you would have done to reach that point would be effectively the same work you’d need to do for an independent solution, so it wouldn’t be in vain.

There are a lot of smart folks in the ActivityPub world, and I wouldn’t be surprised if this kind of problem (i.e. private publishing) has been considered in some depth before. One place you might find some prior art on it is SocialHub, the main ActivityPub community forum (Discourse of course) which is now using the Discourse ActivityPub plugin :slight_smile:

6 Likes