Introducing Discourse AI

EricGT · May 3, 2023, 9:57am

For those wondering what a vector database is

Note: This is from a commercial vector database vendor but it is still the best introduction I know and is the same vector database used by OpenAI.

To understand what a vector means with regards to a transformer model, see this technical YouTube video

sam · May 3, 2023, 10:01am

We use GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres in discourse-ai but are toying with other ideas longer term like weaviate / elastic / milvus

Pinecone are a Discourse user https://community.pinecone.io/

gkogan · May 3, 2023, 12:51pm

Hey @sam yes indeed we are happy Discourse customers, and one of the most common pairings with GPT-4 for the exact use case you mentioned — see the logos + quotes on our homepage. Can we help you with a POC?

sam · May 3, 2023, 10:53pm

Absolutely, I am going to connect you with @Falco and you can discuss.

I think it would be delightful for discourse-ai to ship with a pinecone adapter as well, it reduces enormous amounts of friction for self hosters of our platform.

Fabio_Machado_de_Oli · May 4, 2023, 6:22pm

It seems like you’ve done your research on the costs of training, but I wanted to share my understanding based on the OpenAI fine-tuning guide. If I understand https://platform.openai.com/docs/guides/fine-tuning correctly, they recommend using Ada for classification tasks and providing 100 examples of each class. In that case, we would have a total of 200 examples (spam and not spam). Assuming an average example consists of 500 tokens, the total would be 500 * 200 = 100,000 tokens on Ada, which would cost US$ 0.04 to train. If you were to use Davinci instead, the cost would be US$ 3.00.

I guess that the pricing might be for a single step or a single epoch of training, but I couldn’t find any more detailed information on their website. Please let me know if you have any insights or if I’ve misunderstood something.

jordan-violet · May 4, 2023, 6:47pm

As I mentioned, those costs were for my use case for my business. My training and usage is Davinci, not Ada, so 75x more expensive there. We also practically max out tokens per request.

I don’t know exactly what Sam/Falco would have in mind for their use case—just mentioning generally that fine tuning can be expensive at scale!

byronvoorbach · May 4, 2023, 7:00pm

Congratulation on the release @sam & @Falco !

Would be happy to support Discourse with the evaluation of Weaviate!

Topic		Replies	Views
Discourse AI Plugin included-in-core , ai , official	88	36757	October 13, 2025
ChatGPT Assistant Integration Support	6	1218	April 16, 2024
Discourse AI plugin with self hosted discourse site Support ai	2	192	July 9, 2024
How do you use Discourse AI? Tell us and make it even better! Feature feedback , ai	22	2240	March 2, 2025
Experiments with AI based moderation on Discourse Meta Community moderation , ai	11	611	May 26, 2025

Introducing Discourse AI

Related topics