Training a model on a site's posts?

Maybe not within the scope, but it would be interesting to train a model on all the posts in my forum and use them to create an expert user AI bot that users could interact with, or that could answer questions from users on its own in threads, and link to/quote relevant posts from the past.


I hear you, but there are massive scalability issues here. Training is hellishly expensive and not even available on GPT 3.5 / 4.

The industry is pushing really really hard on

  1. Growing token numbers (eg: Anthropic with 100k token context)
  2. Vector databases for embeddings and leaning on embeddings for context