The Discourse AI plugin has many features that require an LLM to be enabled, like, for example, Summarization, AI Helper, AI Search, AI Bot. While you can use a third party API, like Configure API Keys for OpenAI or Configure API Keys for Anthropic we built Discourse AI since first day to not be locked into those.
Running with HuggingFace TGI
HuggingFace provides an awesome container image that can get you running quickly.
For example:
mkdir -p /opt/tgi-cache
docker run --rm --gpus all --shm-size 1g -p 8080:80 \
-v /opt/tgi-cache:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id mistralai/Mistral-7B-Instruct-v0.2
Should get you up and running with a local instance of Mistral 7B Instruct on the localhost at port 8080, that can be tested with
curl http://localhost:8080/ \
-X POST \
-H 'Content-Type: application/json' \
-d '{"inputs":"<s>[INST] What is your favourite condiment? [/INST] Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]","parameters":{"max_new_tokens":500, "temperature":0.5,"top_p": 0.9}}'
Running with vLLM
Another option to self-host LLMs Discourse AI supports is vLLM, which is a very popular project, licensed under the Apache License.
Here how to get started with a model:
mkdir -p /opt/vllm-cache
docker run --gpus all \
-v /opt/vllm-cache:/root/.cache/huggingface \
-e "MODEL=mistralai/Mistral-7B-Instruct-v0.2" \
-p 8080:8000 --ipc=host vllm/vllm-openai:latest
Which you can test with
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"prompt": "<s> [INST] What was the latest released hero for Dota 2? [/INST] The latest released hero for Dota 2 was", "max_tokens": 200}'
Running with Ollama
Ollama is another popular option for running open source models locally. It simplifies model management and provides an OpenAI-compatible API.
ollama pull mistral
ollama serve
This starts a local server at http://localhost:11434 that Discourse AI can connect to using the Ollama provider.
Making it available for your Discourse instance
Most of the times you will be running this in a dedicated server because of the GPU requirement. When doing so I recommend running a reverse proxy, doing TLS termination and securing the endpoint so it can only be connected by your Discourse instance.
Configuring Discourse AI
LLM connections are now configured through the admin UI rather than site settings. Navigate to /admin/plugins/discourse-ai/ai-llms and add a new LLM:
- Click New to add a model
- Select a Provider — choose vLLM, Hugging Face, or Ollama depending on your inference server
- Enter the URL of your inference endpoint (e.g.
http://your-server:8080) - Enter an API key if your endpoint requires one
- Fill in the model name, tokenizer, max prompt tokens, and other model details
Once your LLM is added, set it as the default via the ai_default_llm_model site setting, or assign it to specific features through their agent configuration in /admin/plugins/discourse-ai/ai-features.
Last edited by @JammyDodger 2024-05-25T11:01:36Z
Check document
Perform check on document: