Self-Hosting an OpenSource LLM for DiscourseAI

Discourse · January 8, 2024, 8:39pm

The Discourse AI plugin has many features that require an LLM to be enabled, like, for example, Summarization, AI Helper, AI Search, AI Bot. While you can use a third party API, like Configure API Keys for OpenAI or Configure API Keys for Anthropic we built Discourse AI since first day to not be locked into those.

Running with HuggingFace TGI

HuggingFace provides an awesome container image that can get you running quickly.

For example:

mkdir -p /opt/tgi-cache
docker run --rm --gpus all --shm-size 1g -p 8080:80 \
  -v /opt/tgi-cache:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-7B-Instruct-v0.2

Should get you up and running with a local instance of Mistral 7B Instruct on the localhost at port 8080, that can be tested with

curl http://localhost:8080/ \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"inputs":"<s>[INST] What is your favourite condiment? [/INST] Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]","parameters":{"max_new_tokens":500, "temperature":0.5,"top_p": 0.9}}'

Running with vLLM

Another option to self-host LLMs Discourse AI supports is vLLM, which is a very popular project, licensed under the Apache License.

Here how to get started with a model:

mkdir -p /opt/vllm-cache
docker run --gpus all \
  -v /opt/vllm-cache:/root/.cache/huggingface \
  -e "MODEL=mistralai/Mistral-7B-Instruct-v0.2" \
  -p 8080:8000 --ipc=host vllm/vllm-openai:latest

Which you can test with

curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"prompt": "<s> [INST] What was the latest released hero for Dota 2? [/INST] The latest released hero for Dota 2 was", "max_tokens": 200}'

Making it available for your Discourse instance

Most of the times you will be running this in a dedicated server because of the GPU requirement. When doing so I recommend running a reverse proxy, doing TLS termination and securing the endpoint so it can only be connected by your Discourse instance.

Configuring DiscourseAI

Discourse AI ships site settings to configure the inference server for open source models. You should point it to your server using either ai_hugging_face_api_url or ai_vllm_endpoint according to what inference software you picked.

After that, change each module to use the model you are running, in the model selection settings, like

ai_helper_model
ai_embeddings_semantic_search_hyde_model
summarization strategy
ai_bot_enabled_chat_bots

Bathinda · March 19, 2024, 4:44am

For anyone searching this topic with/for:
#Llava-Api-keys

Isambard · March 23, 2024, 10:48pm

I’m using vLLM too. I also would recommend the openchat v3.5 0106 model, which is a 7B parameter model which performs very well.

I actually am running it in 4bit quantized so that it runs faster.

oppman · January 13, 2025, 11:43pm

I am assigning this task to an intern. Are there recommendations from anyone on what specific service to sign up for? This is for a test. The intern currently has a test configured with OpenAI. It runs fine. They’re interested in trying the HuggingFace TGI, but it seems that I need to give them a dedicated server with GPU? What’s the minimal specs for a test?

Are there links I can give the intern?

I haven’t looked at this project in depth yet. I am just anticipating that the intern will need some resources and I am trying to make some reasonable recommendations on services for the intern in research.

Eric_Keller · January 15, 2025, 4:16pm

Hey there, while exposing with a self-signed certificate vllm container on a GPU box on-prem, I did not find a good way to add the Root CA to the discourse container so it can access securely this on-prem service over https.

e.g.:

./launcher enter app
curl -L  https://vllm.infra.example.com/v1/models
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Is there a good way to add a self-signed root ca-certificate in the discourse container which would survive container image updates?

As far as I know adding it in the app.yml

run:
  - exec: wget ... && update-ca-certificates

would only work well while building/rebuilding the app.

Any hint welcomed.

Falco · February 21, 2025, 2:37pm

14 posts were split to a new topic: Getting discourse ai to work with ollama locally

Topic		Replies	Views
How to configure Discourse to use a locally installed LLM? Support ai	7	113	June 3, 2025
I want to install Discourse AI on Discourse Installation ai	13	422	June 18, 2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	11484	April 30, 2025
Getting discourse ai to work with ollama locally Support ai	15	207	April 6, 2025
How to use the hugging face llama2 chat bot Dev ai , ai-bot	2	529	March 9, 2024

Self-Hosting an OpenSource LLM for DiscourseAI

Running with HuggingFace TGI

Running with vLLM

Making it available for your Discourse instance

Configuring DiscourseAI

Related topics