Self-Hosting an OpenSource LLM for DiscourseAI

The Discourse AI plugin has many features that require an LLM to be enabled, like, for example, Summarization, AI Helper, AI Search, AI Bot. While you can use a third party API, like Configure API Keys for OpenAI or Configure API Keys for Anthropic we built Discourse AI since first day to not be locked into those.

Running with HuggingFace TGI

HuggingFace provides an awesome container image that can get you running quickly.

For example:

mkdir -p /opt/tgi-cache
docker run --rm --gpus all --shm-size 1g -p 8080:80 \
  -v /opt/tgi-cache:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-7B-Instruct-v0.2

Should get you up and running with a local instance of Mistral 7B Instruct on the localhost at port 8080, that can be tested with

curl http://localhost:8080/ \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"inputs":"<s>[INST] What is your favourite condiment? [/INST] Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]","parameters":{"max_new_tokens":500, "temperature":0.5,"top_p": 0.9}}'

Running with vLLM

Another option to self-host LLMs Discourse AI supports is vLLM, which is a very popular project, licensed under the Apache License.

Here how to get started with a model:

mkdir -p /opt/vllm-cache
docker run --gpus all \
  -v /opt/vllm-cache:/root/.cache/huggingface \
  -e "MODEL=mistralai/Mistral-7B-Instruct-v0.2" \
  -p 8080:8000 --ipc=host vllm/vllm-openai:latest

Which you can test with

curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"prompt": "<s> [INST] What was the latest released hero for Dota 2? [/INST] The latest released hero for Dota 2 was", "max_tokens": 200}'

Making it available for your Discourse instance

Most of the times you will be running this in a dedicated server because of the GPU requirement. When doing so I recommend running a reverse proxy, doing TLS termination and securing the endpoint so it can only be connected by your Discourse instance.

Configuring DiscourseAI

Discourse AI ships site settings to configure the inference server for open source models. You should point it to your server using either ai_hugging_face_api_url or ai_vllm_endpoint according to what inference software you picked.

After that, change each module to use the model you are running, in the model selection settings, like

  • ai_helper_model
  • ai_embeddings_semantic_search_hyde_model
  • summarization strategy
  • ai_bot_enabled_chat_bots
12 Likes

For anyone searching this topic with/for:
#Llava-Api-keys

I’m using vLLM too. I also would recommend the openchat v3.5 0106 model, which is a 7B parameter model which performs very well.

I actually am running it in 4bit quantized so that it runs faster.

I am assigning this task to an intern. Are there recommendations from anyone on what specific service to sign up for? This is for a test. The intern currently has a test configured with OpenAI. It runs fine. They’re interested in trying the HuggingFace TGI, but it seems that I need to give them a dedicated server with GPU? What’s the minimal specs for a test?

Are there links I can give the intern?

I haven’t looked at this project in depth yet. I am just anticipating that the intern will need some resources and I am trying to make some reasonable recommendations on services for the intern in research.

Hey there, while exposing with a self-signed certificate vllm container on a GPU box on-prem, I did not find a good way to add the Root CA to the discourse container so it can access securely this on-prem service over https.

e.g.:

./launcher enter app
curl -L  https://vllm.infra.example.com/v1/models
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Is there a good way to add a self-signed root ca-certificate in the discourse container which would survive container image updates?

As far as I know adding it in the app.yml

run:
  - exec: wget ... && update-ca-certificates

would only work well while building/rebuilding the app.

Any hint welcomed.