Discourse AI - Self-Hosted Guide

:warning: You need over 16GB of free RAM, and plenty of CPU / GPU / Disk to spare to run those services.
Also keep in mind that running those services are mildly complicated, and we are on a preview period where everything is changing quickly.

This is a guide aimed at running your own instances of the services that power Discourse AI modules.


If you want to use Discourse AI on your self-hosted instance, you may need to also run the companion services for the modules that you want to enable.

Each module has one or more needed companion services, and those services use more CPU / GPU / disk space than Discourse itself, so keep in mind that this is not recommended for people unfamiliar with Linux server administration and Docker.


To run a copy of the classification service use:

docker run -it --rm --name detoxify -e BIND_HOST= -p6666:80 ghcr.io/discourse/detoxify:latest


To run a copy of the classification service use:

docker run -it --rm --name nsfw -e BIND_HOST= -p6666:80 ghcr.io/discourse/nsfw-service:latest


To run a copy of the classification service use:

docker run -it --rm --name sentiment -e BIND_HOST= -p6666:80 ghcr.io/discourse/sentiment-service:latest

Summarization / AI Helper / AI Bot

This modules depend on a LLM to work. You can deploy an open source LLM using :hugs: TGI container, like for example:

docker run -d --rm --gpus all --shm-size 1g   -p 80:80   -v /mnt:/data   -e GPTQ_BITS=4   -e GPTQ_GROUPSIZE=32   -e REVISION=gptq-4bit-32g-actorder_True   ghcr.io/huggingface/text-generation-inference:latest   --model-id TheBloke/Upstage-Llama-2-70B-instruct-v2-GPTQ   --max-batch-prefill-tokens=12000   --max-total-tokens=12000   --max-input-length=10000   --quantize=gptq   --sharded=true   --num-shard=$(lspci | grep NVIDIA | wc -l | tr -d '\n')   --rope-factor=2

The code below will give reasonable inference performance to power those modules in a g5.24xlarge. Alternatively you can get a compatible API endpoint using https://ui.endpoints.huggingface.co/ service.


To run a copy of the classification service use:

docker run -it --rm --name embedding -e BIND_HOST= -p6666:80 ghcr.io/discourse/embedding-service:latest

Running in production

You may want to put this service behind a reverse proxy to enable features like load balancing, TLS, health checks, rate limits, etc when running in a live site.

After the service is up and running, configure the module to connect to the domain where the service is running using the appropriate site setting and then enable the module.


All services made by us respect the following environment knobs:

  • BIND_HOST: where the webserver will bind to
  • BIND_PORT: port to bind to
  • API_KEYS: a pipe separated list of valid API keys for this service.

The composer helper is not possible by now self-hosting?

Kudos to the team for this development and implementation :fire::raised_hands:

1 Like

Composer Helper only works with OpenAI or Anthropic APIs for now, so it will work just fine in self-hosted situations provided you have one of those APIs.


I have Composer Helper up and running, thanks!

Does Summarization require a local classification service? Or will it run with just an OpenAI API key if using ChatGPT3.5 model? I turned it on but aren’t seeing it on topics.


Per Discourse AI - Summarization you can use it with OpenAI by configuring the OpenAI key (which you already did), selecting one of the GPT models as the summarization model and enabling the summarization module.

The summary button is only showing for topics with >50 replies at the moment, but we will enable it for all topics soon.


Can you please share some sample requests? I am currently trying to set this up in an AWS ASG on an EC2 instance and I can’t get it to work; I only see 400 bad request in the Discourse logs.

Furthermore, a healthcheck URL would be great, / issues a 404 error.

/srv/ok and /health are the health check endpoints.

On the top of my head something along as:

jo -p model=bart-large-cnn-samsum content="Long sentence to summarize goes here" | \
  curl --json @- -XPOST http://service/api/v1/classify

For the summarization service should work.


Can you suggest to use summarization service on localhost with healthcheck from Nginx module if we are OK with limits and load?

I just want to try open-source models, we get it working with OpenAI API keys by now.

There are plans to enable multilingual on summarize using models like ChatGPT3.5 that are made compatible?

If that’s what you want it should work, yes.

Summarization already work with OpenAI and Anthropic APIs, so that will give you multilingual capabilities. You may need to hack a bit to translate the prompt for it to keep it more grounded on the topic language tho.


Good news by AWS: Amazon RDS for PostgreSQL now supports pgvector for simplified ML model integration

1 Like

@Falco Would you be kind enough to give and example of a server configuration that has ‘plenty of CPU / GPU / Disk’ and can run the self hosted AI alongside an average Discourse forum?

I’d like to see that as well, please. Also, given the resource requirement would it be better (possible, more cost effective ?) to offload the companion AI services to a separate VPS?

example of a server configuration

Depends on the exact models and modules of Discourse AI you will want to run. For example the toxicity module uses 5GB and the NSFW uses 1GB of RAM. Disk space is similar, and CPU/GPU is used for inference, so your needs depend on the numbers of requests per second your expect to have.

Yes, that is probably the best way.


Alright, i’ve taken a crack at this:

Napkin estimates:


  • $0.0008 per 100 words
  • 1 user averages about 100 words (or tokens) per day on each AI module
  • Running all 6 AI modules
    $0.0008 * 6 = $0.0048

Total monthly cost per user: $0.0048 * 30 = $0.144

The minimum server requirements for self hosting are around:

  • 16GB of free RAM, 32 preferred
  • 3.5 GHz or higher CPU and 8 cores or more
  • 100GB SSD

The lowest cost server which meets those requirements on Digital Ocean is:

  • 16 GB Ram
  • 8 Premium Intel vCPUs (over 3.5 GHz)
  • Bandwidth: 6,000 GiB
  • SSD: 2x 200 GiB
  • Monthly cost: $244.00

So self-hosting ChatGPT4 will be more cost effective than using its API service when Discourse has around 2,000 active users per month.

With some pretty wobbly and generous rounding involved. Does that sound about right @Falco

GPT-4 or 3.5 can not be self hosted.

Some LLMs are open source such as Falcon or various LLaMA based models (which come with licensing questions) can be self hosted but to date they all underperform GPT 4 or even 3.5

Your back of the napkin calculation there is wildly off, if you are going to be self hosting an LLM you probably want an A100 or H100, maybe a few of them … try googling for prices…


I guess that’s what you get when using ChatGPT to help you work out self-hosting ChatGPT costs.

Well anyway, i’ll try to contribute something and come back to update it when i have some user data to compare.

Here’s the calculations i ran for using ChatGPT3.5’s API with the modules above, based on the very vague assumption that an average active user in one month is on average going to generate 100 words in one execution:

ChatGPT3.5 API Costs

  • $0.0003 per 100 words in one execution
  • 1 active user averages about 100 words per day on each AI module

Average monthly cost per AI plugin/component: 0.009

  • 6 = $0.054

Gives a total monthly cost per user for all 6 plugins of $0.054 if they run on ChatGPT3.5

We just started running the AI services here for Meta in a g4dn.xlarge so I can now recommend that as a baseline.

Thanks. Current pricing is given here for anyone wondering what a g4dn.xlarge is. Hopefully you will be able to post utilization data at some point so we can get a handle on real world costs.

1 Like

The machine is basically idle with just Meta traffic. It could handle a few Metas worth of traffic just fine.

1 Like