You need over 16GB of free RAM, and plenty of CPU / GPU / Disk to spare to run those services.
Also keep in mind that running those services are mildly complicated, and we are on a preview period where everything is changing quickly.
This is a guide aimed at running your own instances of the services that power Discourse AI modules.
Introduction
If you want to use Discourse AI on your self-hosted instance, you may need to also run the companion services for the modules that you want to enable.
Each module has one or more needed companion services, and those services use more CPU / GPU / disk space than Discourse itself, so keep in mind that this is not recommended for people unfamiliar with Linux server administration and Docker.
Toxicity
To run a copy of the classification service use:
docker run -it --rm --name detoxify -e BIND_HOST=0.0.0.0 -p6666:80 ghcr.io/discourse/detoxify:latest
NSFW
To run a copy of the classification service use:
docker run -it --rm --name nsfw -e BIND_HOST=0.0.0.0 -p6666:80 ghcr.io/discourse/nsfw-service:latest
Sentiment
To run a copy of the classification service use:
docker run -it --rm --name sentiment -e BIND_HOST=0.0.0.0 -p6666:80 ghcr.io/discourse/sentiment-service:latest
Summarization / AI Helper / AI Bot
This modules depend on a LLM to work. You can deploy an open source LLM using TGI container, like for example:
docker run -d --rm --gpus all --shm-size 1g -p 80:80 -v /mnt:/data -e GPTQ_BITS=4 -e GPTQ_GROUPSIZE=32 -e REVISION=gptq-4bit-32g-actorder_True ghcr.io/huggingface/text-generation-inference:latest --model-id TheBloke/Upstage-Llama-2-70B-instruct-v2-GPTQ --max-batch-prefill-tokens=12000 --max-total-tokens=12000 --max-input-length=10000 --quantize=gptq --sharded=true --num-shard=$(lspci | grep NVIDIA | wc -l | tr -d '\n') --rope-factor=2
The code below will give reasonable inference performance to power those modules in a g5.24xlarge. Alternatively you can get a compatible API endpoint using https://ui.endpoints.huggingface.co/ service.
Embeddings
To run a copy of the classification service use:
docker run -it --rm --name embedding -e BIND_HOST=0.0.0.0 -p6666:80 ghcr.io/discourse/embedding-service:latest
Running in production
You may want to put this service behind a reverse proxy to enable features like load balancing, TLS, health checks, rate limits, etc when running in a live site.
After the service is up and running, configure the module to connect to the domain where the service is running using the appropriate site setting and then enable the module.
Configuration
All services made by us respect the following environment knobs:
- BIND_HOST: where the webserver will bind to
- BIND_PORT: port to bind to
- API_KEYS: a pipe separated list of valid API keys for this service.