为DiscourseAI自我托管一款开源的LLM

Discourse · 2024 年1 月 8 日 20:39

Discourse AI 插件有许多需要启用 LLM 的功能，例如，摘要、AI 助手、AI 搜索、AI 机器人。虽然您可以使用第三方 API，例如配置 OpenAI API 密钥或配置 Anthropic API 密钥，但我们从第一天起就构建了 Discourse AI，以便不被它们锁定。

使用 HuggingFace TGI 运行

HuggingFace 提供了一个很棒的容器镜像，可以帮助您快速运行。

例如：

mkcdir -p /opt/tgi-cache
docker run --rm --gpus all --shm-size 1g -p 8080:80 \
  -v /opt/tgi-cache:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-7B-Instruct-v0.2

应该可以在本地主机上的 8080 端口上运行 Mistral 7B Instruct 的本地实例，可以使用以下命令进行测试：

curl http://localhost:8080/ \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"inputs":"<s>[INST] What is your favourite condiment? [/INST] Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]","parameters":{"max_new_tokens":500, "temperature":0.5,"top_p": 0.9}}'

使用 vLLM 运行

Discourse AI 支持的另一个自托管 LLM 选项是 vLLM，这是一个非常受欢迎的项目，根据 Apache 许可证授权。

以下是如何开始使用模型：

mkdir -p /opt/vllm-cache
docker run --gpus all \
  -v /opt/vllm-cache:/root/.cache/huggingface \
  -e "MODEL=mistralai/Mistral-7B-Instruct-v0.2" \
  -p 8080:8000 --ipc=host vllm/vllm-openai:latest

您可以使用以下命令进行测试：

curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"prompt": "<s> [INST] What was the latest released hero for Dota 2? [/INST] The latest released hero for Dota 2 was", "max_tokens": 200}'

使其可供您的 Discourse 实例使用

大多数情况下，由于 GPU 要求，您将在专用服务器上运行此程序。执行此操作时，我建议运行反向代理，执行 TLS 终止并保护端点，使其只能由您的 Discourse 实例连接。

配置 DiscourseAI

Discourse AI 提供了站点设置来配置开源模型的推理服务器。您应该使用 ai_hugging_face_api_url 或 ai_vllm_endpoint（根据您选择的推理软件）将其指向您的服务器。

之后，在模型选择设置中，将每个模块更改为使用您正在运行的模型，例如：

ai_helper_model
ai_embeddings_semantic_search_hyde_model
summarization strategy
ai_bot_enabled_chat_bots

Bathinda · 2024 年3 月 19 日 04:44

对于搜索此主题的任何人，包括/用于：
#Llava-Api-keys

Isambard · 2024 年3 月 23 日 22:48

我也在使用 vLLM。我还会推荐 openchat v3.5 0106 模型，这是一个 7B 参数模型，性能非常好。

我实际上是以 4 位量化方式运行它的，这样可以运行得更快。

oppman · 2025 年1 月 13 日 23:43

我将此任务分配给一名实习生。有人推荐注册什么特定服务吗？这是为了测试。实习生目前已配置好 OpenAI 测试。运行正常。他们有兴趣尝试 HuggingFace TGI，但似乎我需要为他们提供专用的 GPU 服务器？测试的最低配置是什么？

我可以提供给实习生一些链接吗？

我还没有深入研究这个项目。我只是预计实习生将需要一些资源，并且我正在尝试为实习生的研究提供一些合理的建议。

Eric_Keller · 2025 年1 月 15 日 16:16

您好，在使用本地 GPU 服务器上的自签名证书公开 vllm 容器时，我没有找到一种好方法将根 CA 添加到 discourse 容器中，以便它可以安全地通过 HTTPS 访问此本地服务。

例如：

./launcher enter app
curl -L https://vllm.infra.example.com/v1/models
curl: (60) SSL certificate problem: unable to get local issuer certificate
更多详情请参见：https://curl.se/docs/sslcerts.html

curl 无法验证服务器的合法性，因此无法建立安全连接。要了解有关此情况的更多信息以及如何解决，请访问上面提到的网页。

有没有一种好方法可以在 discourse 容器中添加自签名根 CA 证书，并且该证书能在容器镜像更新后继续存在？

据我所知，在 app.yml 中添加它

run:
  - exec: wget ... && update-ca-certificates

只在构建/重建应用程序时效果最好。

欢迎任何提示。

Falco · 2025 年2 月 21 日 14:37

14 篇帖子已拆分为新主题：Getting discourse ai to work with ollama locally

话题		回复	浏览量
How to configure Discourse to use a locally installed LLM? Support ai	8	167	2025 年9 月 17 日
I want to install Discourse AI on Discourse Installation ai	13	477	2024 年6 月 18 日
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	12412	2025 年4 月 30 日
Getting discourse ai to work with ollama locally Support ai	15	324	2025 年4 月 6 日
How to use the hugging face llama2 chat bot Dev ai , ai-bot	2	543	2024 年3 月 9 日

为DiscourseAI自我托管一款开源的LLM

使用 HuggingFace TGI 运行

使用 vLLM 运行

使其可供您的 Discourse 实例使用

配置 DiscourseAI

相关话题