Introducing Discourse AI

For those wondering what a vector database is

Note: This is from a commercial vector database vendor but it is still the best introduction I know and is the same vector database used by OpenAI.

To understand what a vector means with regards to a transformer model, see this technical YouTube video

4 Likes

We use GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres in discourse-ai but are toying with other ideas longer term like weaviate / elastic / milvus

Pinecone are a Discourse user :hugs: https://community.pinecone.io/

11 Likes

Hey @sam yes indeed we are happy Discourse customers, and one of the most common pairings with GPT-4 for the exact use case you mentioned ā€” see the logos + quotes on our homepage. Can we help you with a POC?

8 Likes

Absolutely, I am going to connect you with @Falco and you can discuss.

I think it would be delightful for discourse-ai to ship with a pinecone adapter as well, it reduces enormous amounts of friction for self hosters of our platform.

7 Likes

It seems like youā€™ve done your research on the costs of training, but I wanted to share my understanding based on the OpenAI fine-tuning guide. If I understand https://platform.openai.com/docs/guides/fine-tuning correctly, they recommend using Ada for classification tasks and providing 100 examples of each class. In that case, we would have a total of 200 examples (spam and not spam). Assuming an average example consists of 500 tokens, the total would be 500 * 200 = 100,000 tokens on Ada, which would cost US$ 0.04 to train. If you were to use Davinci instead, the cost would be US$ 3.00.

I guess that the pricing might be for a single step or a single epoch of training, but I couldnā€™t find any more detailed information on their website. Please let me know if you have any insights or if Iā€™ve misunderstood something.

2 Likes

As I mentioned, those costs were for my use case for my business. My training and usage is Davinci, not Ada, so 75x more expensive there. We also practically max out tokens per request.

I donā€™t know exactly what Sam/Falco would have in mind for their use caseā€”just mentioning generally that fine tuning can be expensive at scale!

2 Likes

Congratulation on the release @sam & @Falco !

Would be happy to support Discourse with the evaluation of Weaviate! :clap:

7 Likes