Discourse AI - Embeddings

Overgrow · September 10, 2024, 6:45pm

Thanks. Do I understand it correctly that backfill is when the vectorization happens? When switching between models, do the vectors need to be recalculated (Are they “proprietary”)? I assume yes.

It’d be useful to know how the costs of using the OpenAI API stack up against investing in a GPU-powered server with opensource solution. Is there a formula or any way to estimate the number of tokens used? We’re only using the API to vectorize posts, not for calculating vector distances, right? So, the number of tokens used depends on how much content we have, correct?

I assume that for both related topics and AI-powered search, all posts need to be vectorized only once, so I can calculate the total number of words in posts table and derive the number of tokens needed. The same process would apply to the daily addition of posts. I’m neglecting the search phrases for now.

Topic		Replies	Views
Problem with the new Discourse AI "related / similar topics"-function Support ai , related-topics	5	887	August 21, 2023
Discourse AI - Related topics Site Management how-to , ai , related-topics	13	2581	August 29, 2025
Discourse AI - AI search Site Management how-to , ai , ai-search	10	2601	August 5, 2025
API access to the embedding(s) for a post Feature completed	4	431	September 15, 2024
How to enable Related topics? Support ai , related-topics	3	718	August 25, 2023

Discourse AI - Embeddings

Related topics