Discourse AI - Related topics

:bookmark: This guide explains how to enable and configure the Related topics feature of the Discourse AI plugin.

:person_raising_hand: Required user level: Administrator

Related topics helps users discover relevant content by suggesting semantically similar topics based on the one they’re currently reading. This enhances content exploration and increases user engagement.

Features

  • Semantic textual similarity: Goes beyond keyword matching to find truly related content
  • Toggle between “Suggested” and “Related” topics
  • Available for both anonymous and logged-in users

Enabling Related topics

:information_source: Related topics is turned on by default for all Discourse hosted customers with the Discourse AI plugin enabled

Prerequisites

Related topics requires Embeddings to function.

If you are on our hosting, Embeddings is provided using an open-source model. No additional setup is required.

Self-hosted instances will need to configure an embedding model through a supported provider.

Configuration

  1. Go to Admin → Plugins → Discourse AI → AI Features
  2. Find the Embeddings module and configure it:
    • Set ai_embeddings_selected_model to an embedding definition you have configured
    • Enable ai_embeddings_enabled to activate Embeddings
  3. Enable ai_embeddings_semantic_related_topics_enabled to activate the Related Topics feature

Setting up an embedding model

Before enabling embeddings, you need to configure an embedding model. Go to Admin → Plugins → Discourse AI → Embeddings to create a new embedding definition. You can choose from several presets:

  • Open AI: text-embedding-3-small or text-embedding-3-large (recommended for most sites)
  • Google: gemini-embedding-001
  • Hugging Face (self-hosted inference): multilingual-e5-large (recommended for non-English or multilingual sites), bge-large-en, or bge-m3

You will need to provide an API key (or link an AI Secret) and endpoint URL for your chosen provider.

Additional settings

The following settings let you fine-tune the Related Topics feature:

  • ai_embeddings_semantic_related_topics: Maximum number of topics to show in the related topics section (default: 5)
  • ai_embeddings_semantic_related_include_closed_topics: Whether to include closed topics in related results (default: true)
  • ai_embeddings_semantic_related_age_penalty: Apply a penalty to older topics so newer content is preferred (default: 0.0, range: 0.0–2.0)
  • ai_embeddings_semantic_related_age_time_scale: Time scale in days for the age penalty (default: 365)

Technical FAQ

Expand to view a diagram of the Related topics architecture

The overview is, that when a topic is created / updated this happens:

sequenceDiagram
    User->>Discourse: Creates topic
    Discourse-->>Embedding Microservice: Generates embeddings
    Embedding Microservice-->>Discourse: 
    Discourse-->>PostgreSQL:Store Embeddings 

And during topic visit:

sequenceDiagram
    User->>Discourse: Visits topic
    Discourse-->>PostgreSQL: Query closest topics
    PostgreSQL-->>Discourse: 
    Discourse->>User: Presents related topics 

How does Related topics work?

  • When a user visits a topic, Discourse queries the database for the most semantically similar topics based on their embedded representations. These related topics are then presented to the user, encouraging further exploration of the community’s content.

How is topic/post data processed?

  • For Discourse-hosted sites, data is processed within our secure virtual private datacenter. For self-hosted sites, data processing depends on your chosen third-party provider.

Where is the embeddings data stored?

  • Embeddings data is stored in your Discourse database, alongside other forum data like topics, posts, and users.

What embedding models are available?

  • Discourse AI supports models from OpenAI (text-embedding-3-small, text-embedding-3-large), Google (gemini-embedding-001), Hugging Face-compatible endpoints (bge-large-en, bge-m3, multilingual-e5-large), and Cloudflare Workers AI. You can also configure custom embedding models through the admin UI.

Last edited by @Saif 2024-11-04T18:08:05Z

Last checked by @hugh 2024-08-06T04:30:59Z

Check documentPerform check on document:
14 Likes

Something worth keeping an eye on.

In reviewing many post in Related Topics for an English site (OpenAI) starting to notice that topics in Spanish tend to be grouped together and suspect that if they were first translated to English each post would have a different vector and thus be clustered with other post. :slightly_smiling_face:



A side benefit of this feature for moderators is to check that the categories of the topics listed in Related Topics are correct.

As I review each new post I also check the Related Topics. This is becoming an effective way to identify topics created with the wrong category.

FYI - A related idea was noted in this feature request.



Find this topic when often needing following link which is not so easy to find so noting here.

2 Likes

That behavior is governed by the model, and it appears to be a know problem:

I think the OSS model we recommend for multilingual sites does a better job at this, but we still need to rollout it to more customers to validate this.

2 Likes

It won’t let me enable this option:

Am I missing something here or is Gemini alone not enough?

UPDATE: The instructions and error description may want to be updated to add that the ai embeddings model should also be updated to match the provider otherwise ai_embeddings_enabled can’t be enabled. The parameter description is missing Gemini as an option.

1 Like

7 posts were split to a new topic: “Net::HTTPBadResponse” errors on Gemini Embeddings

What do I fill here pls:

I want to fill the above, because I want to enable the first option among the 4 shown below:

If you use OpenAI, nothing.

1 Like

Then this 1st option (Embeddings Module) troubles me, doesn’t let me enable it:

Most of those are empty. But ai embeddings discourse service api key is your OpenAI API and ai embeddings discourse service api endpoint is https://api.openai.com/v1/embeddings. Model should be text-embedding-3-large (sure, it can be small too but it has some issues).

1 Like

3 posts were split to a new topic: How to get both Suggested and Related topics to display

What were your results from comparing small and large? I know there is a difference in dimensions that affects the model’s precision. The small version is 5x cheaper. Is it really unusable in the real world for topic similarity? Our forum is 99% English.

I’d be very interested in hearing more. Can you please elaborate on where all-mpnet-base-v2 sits in comparison to OpenAI models for a purely English site?

Embeddings are so cheap that price doesn’t matter — unless there is myriad posts when 0.01 cents matter in total costs.

But honestly… I didn’t see any differences. And for me, because there is chance I can’t use RAG and embeds properly, both are equal useless. i know that is badly against public opinion, but on my site that system just doesn’t find and use anything useful.

Propably it comes from OpenAI-models but I don’t have enough money to use those more professional solutions.

1 Like

I’ve been using the text-embedding-3-small model before I read this. Is the text-embedding-ada-002 a lot better?

Ada is previous generation

1 Like

A post was split to a new topic: Related Topics not translated