Discourse AI - セルフホステッドガイド

Hifihedgehog · 2023 年 9 月 7 日午後 3:01

これらのDockerコマンドに、デタッチモード（-dが欠落している）でない特別な理由があるのでしょうか？

pfaffman · 2023 年 9 月 7 日午後 3:08

「-d を付けて起動すべきではないか？」という意味であれば、おそらくそうでしょう。

「なぜ OP はこれらのコマンドを -d を付けて起動するように指示しなかったのか？」という意味であれば、これらは単に起動して動作させる方法の例として意図されているのだと思います。実際には、本番環境で役立つように起動するには、<other stuff> のような追加作業が必要になるでしょう。

Hifihedgehog · 2023 年 9 月 7 日午後 3:14

まさにその質問です。的確に答えていただきました。Dockerインスタンスの設定からしばらく離れていましたが、今になって思い出してきました。本番環境で役立つようにするために「その他のこと」をするとおっしゃいましたが、他に「これをやれ！」と叫ぶようなことはありますか？（各Dockerインスタンスで同じ6666というポート番号を変更するという明らかなこと以外で。）

Hifihedgehog · 2023 年 9 月 7 日午後 3:36

OK。パイプ区切りのAPIキーについてですが、それらはサービスホストとは完全に無関係で、クライアントから受け入れたい英数字のキーを指定すればよいのでしょうか？

pfaffman · 2023 年 9 月 7 日午後 3:40

ポートの変更は、バックグラウンドで実行させることよりも「明白でない」というのはどういうことでしょうか？

それが問題なのです。あなたが明白だと考えていることを深く理解していない限り、質問に答えることは不可能です。ほとんどの場合、そのものをどう役立てるべきか確信が持てないのであれば、ここでは得られない助けが必要でしょう。

Hifihedgehog · 2023 年 9 月 7 日午後 3:53

過去に何十ものDockerコンテナを実行してきたからです。Dockerに触れていないのはここ2年ほどで、深く掘り下げていました。しばらく触っていなかったので最初は明白ではありませんでしたが、議論によってDockerを使用するための基本的な知識が蘇りました。

それが問題なのです。あなたが明白だと考えていることについて、深い知識がない限り、質問に答えることは不可能です。ほとんどの場合、その stuff をどのように役立てるかについて確信がないのであれば、ここでは得られない助けが必要でしょう。

それが問題なのです。私のような異なるDockerシステムでの経験がある人にとっても、明白なことが明白ではないことがあります。あなたが言ったことは、「質問する前にその答えを知っているべきだ」と解釈できます。私たちの何人かはボランティアサービスとしてコミュニティを運営しており、Discourseの最も詳細な部分からPostgresのデータ構造などに至るまで、24時間年中無休で学習しているわけではないことを理解してください。あなたは私をシャットダウンしているように感じましたし、それはコミュニティフォーラムであるべき場所で感謝されることではありませんでした。誰もが自由に、そして喜んでお互いを助け合うべきです。

本題に戻ると、API_KEYS がどのように利用されるべきかを把握するためにGoogleでいくつか検索しましたが、うまくいきませんでした。明白なことを見落としている可能性があり、プラットフォームの最も低いレベルまで広範な知識を持つDiscourseのプロフェッショナルであるあなたにとっては、まったくもってイライラするかもしれませんが、ここではコミュニティの議論をしたいのです。そうすれば、まだあなたのスキルレベルに達していない他の人も利益を得ることができます。結局のところ、Discourseの開発者以外の人がこのソフトウェアを使用できるようにすることが目的です。

pfaffman · 2023 年 9 月 7 日午後 4:08

お気持ちお察しします。実行中のサービスに遭遇し、どのように開始したのかほとんど覚えていないということは、思っている以上に頻繁に起こります。

そうですね。自分で書いた指示でさえ、必要になったときには意味がわからなくなります。

申し訳ありません。失礼なことや意地悪なことを言うつもりはありませんでしたし、そう見えてしまったようです。私の言いたかったのは、標準インストールを実行している人をサポートするだけでも十分大変なのに、あなたのスキルや、どのように起動する予定なのか、公開インターネット上で実行するのか、そしてHTTPSで保護する方法を知っているのか（APIキーで保護していると考えているなら、おそらく知っているでしょう）、などを理解するのは難しいということです。

ええ。他の人が連絡できる場所にこれを置くのであれば、そのAPI_KEYS変数を定義し、ランダムなものを生成してキーとして使用する方法を見つけるのが良いでしょう。そして、その同じキーをプラグインの設定に入力します。私はそうしました。間違ったキーを使用すると壊れるかどうかは確認しませんでしたが、正直なところ、確認すべきでした。おそらく、これからプラグインを追加するインスタンスで確認するでしょう。

しかし、OPが-dを含め、API_KEYS環境変数を設定する方が良いかもしれません。

Falco · 2023 年 9 月 7 日午後 4:11

API_KEYS 環境変数は、何らかの理由で、設定された API_KEYS のいずれかをヘッダーで指定するクライアントにサービスを制限したい場合に使用できるオプションのものです。

内部で単一インスタンス用に実行している場合はあまり必要ありませんが、インターネット経由で実行する場合や共有環境で実行する場合は役立つ可能性があります。

Hifihedgehog · 2023 年 9 月 7 日午後 4:19

@Falco、@pfaffman、ご協力いただきありがとうございます。話を脱線させてしまったらすみません！お二人のご協力に大変感謝しております！

JonahAragon1 · 2023 年 10 月 31 日午前 5:06

これらのサービスはすべて、複数のDiscourseインストールで使用できますか、それともサイトごとに実行する必要がありますか？

Falco · 2023 年 10 月 31 日午前 11:43

それらはすべて、インスタンス間で共有しても安全です。

satonotdead · 2023 年 12 月 14 日午後 6:17

OpenAI APIキーで要約機能はまだ使えますか？

Falco · 2023 年 12 月 14 日午後 6:20

はい、キーを入力し、要約設定でOpenAIモデルを選択してください。

Jagster · 2023 年 12 月 14 日午後 7:54

もしトピックが英語またはマイナー言語以外の言語を使用している場合、1つの小さな問題があります。一度は正しい言語を使用し、突然英語を使い始めます。どちらの方法でも言語の変更は完全にランダムに発生しているようです。

dfriestedt · 2023 年 12 月 14 日午後 9:40

要約エンドポイントをテストしています。

docker run -d --rm --gpus all --shm-size 1g -p 80:80 -v /mnt:/data -e GPTQ_BITS=4 -e GPTQ_GROUPSIZE=32 -e REVISION=gptq-4bit-32g-actorder_True ghcr.io/huggingface/text-generation-inference:latest --model-id TheBloke/Upstage-Llama-2-70B-instruct-v2-GPTQ --max-batch-prefill-tokens=12000 --max-total-tokens=12000 --max-input-length=10000 --quantize=gptq --sharded=true --num-shard=$(lspci | grep NVIDIA | wc -l | tr -d '\\n') --rope-factor=2

しかし、実行すると次のエラーが発生します。このマシンには (2) 台の Tesla T4 があり、他のプロセスは GPU にアクセスしていません。使用状況を以下に示します。

user@gpu2-hc1node:~$ sudo docker logs -f 68e27eb51ee1
2023-12-14T21:30:12.861320Z  INFO text_generation_launcher: Args { model_id: "TheBloke/Upstage-Llama-2-70B-instruct-v2-GPTQ", revision: Some("gptq-4bit-32g-actorder_True"), validation_workers: 2, sharded: Some(true), num_shard: Some(2), quantize: Some(Gptq), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 10000, max_total_tokens: 12000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 12000, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "68e27eb51ee1", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: Some(2.0), json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-12-14T21:30:12.861350Z  INFO text_generation_launcher: Sharding model on 2 processes
2023-12-14T21:30:12.861441Z  INFO download: text_generation_launcher: Starting download process.
2023-12-14T21:30:19.986231Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-12-14T21:30:20.771527Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-12-14T21:30:20.771941Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2023-12-14T21:30:20.771967Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-12-14T21:30:27.769624Z  WARN text_generation_launcher: Disabling exllama v2 and using v1 instead because there are issues when sharding

2023-12-14T21:30:27.997163Z  WARN text_generation_launcher: Disabling exllama v2 and using v1 instead because there are issues when sharding

2023-12-14T21:30:28.046134Z  WARN text_generation_launcher: Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2

2023-12-14T21:30:28.071687Z  WARN text_generation_launcher: Could not import Mistral model: Mistral model requires flash attn v2

2023-12-14T21:30:28.072298Z  WARN text_generation_launcher: Could not import Mixtral model: Mistral model requires flash attn v2

2023-12-14T21:30:28.241375Z  WARN text_generation_launcher: Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2

2023-12-14T21:30:28.262756Z  WARN text_generation_launcher: Could not import Mistral model: Mistral model requires flash attn v2

2023-12-14T21:30:28.263363Z  WARN text_generation_launcher: Could not import Mixtral model: Mistral model requires flash attn v2

2023-12-14T21:30:30.786133Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2023-12-14T21:30:30.786133Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-12-14T21:30:40.348755Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 191, in get_multi_weights_col
    qweight = torch.cat(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacty of 14.76 GiB of which 74.75 MiB is free. Process 19973 has 14.68 GiB memory in use. Of the allocated memory 13.73 GiB is allocated by PyTorch, and 74.36 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

モデルクラッシュ後の nvidia-smi。

Thu Dec 14 15:39:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------|
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:86:00.0 Off |                    0 |
| N/A   54C    P0    28W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   55C    P0    28W /  70W |      0MiB / 15109MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

モデルを起動すると、両方の GPU で GPU 使用率が約 100% に増加し、その後クラッシュします。

Falco · 2023 年 12 月 15 日午前 3:40

T4が2枚では、そのモデルには少なすぎます。それらでは、プロンプト互換の7Bモデルのようなものを試すことができます。

dfriestedt · 2023 年 12 月 16 日午前 10:59

T4で以下のモードを実行できました。

sudo docker run --gpus all --shm-size 1g -p 80:80 -v /home/deeznnutz/discourse/data:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id tiiuae/falcon-7b-instruct --max-batch-prefill-tokens 2048

ローカルでテストしたところ、動作しました。

curl https://Public_URL/generate     -X POST     -d '{\"inputs\":\"What is Deep Learning?\",\"parameters\":{\"max_new_tokens\":20}}'     -H 'Content-Type: application/json'

{\"generated_text\":\"\\nDeep learning is a branch of machine learning that uses artificial neural networks to learn and make decisions.\"}

しかし、Discourseで以下の設定で実行しようとすると、エラーが発生します。

ai summarization discourse service api endpoint: https://URL/generate/
ai summarization discourse service api key: random numbers
summarization strategy: Discourse AI's long-t5-tglobal....-book-summary

以下のエラーが発生します。

Message (6 copies reported)

Job exception: Net::HTTPBadResponse


Backtrace

/var/www/discourse/plugins/discourse-ai/lib/inference/discourse_classifier.rb:13:in `perform!'
/var/www/discourse/plugins/discourse-ai/lib/summarization/strategies/truncate_content.rb:46:in `completion'
/var/www/discourse/plugins/discourse-ai/lib/summarization/strategies/truncate_content.rb:42:in `summarize_with_truncation'
/var/www/discourse/plugins/discourse-ai/lib/summarization/strategies/truncate_content.rb:23:in `summarize'
/var/www/discourse/app/services/topic_summarization.rb:38:in `summarize'
/var/www/discourse/app/jobs/regular/stream_topic_summary.rb:25:in `execute'
/var/www/discourse/app/jobs/base.rb:292:in `block (2 levels) in perform'
/var/www/discourse/vendor/bundle/ruby/3.2.0/gems/rails_multisite-5.0.0/lib/rails_multisite/connection_management.rb:82:in `with_connection'
/var/www/discourse/app/jobs/base.rb:279:in `block in perform'
/var/www/discourse/app/jobs/base.rb:275:in `each'

Falco · 2023 年 12 月 16 日午後 3:49

ai_hugging_face_api_url の下に、そのサービスの URL を設定する必要があります。

dfriestedt · 2023 年 12 月 16 日午後 4:16

実行中のモデルで利用可能な要約戦略がサポートされていないようです。

ghcr.io/huggingface/text-generation-inference:1.3 --model-id tiiuae/falcon-7b-instruct

Westcan · 2024 年 3 月 9 日午前 4:54

Toxicity分類サービスをインストールして実行した場合、どのように非アクティブ化またはアンインストールするのか疑問に思っています。よろしくお願いします。

トピック		返信	表示
關於Discourse AI Support ai	6	764	2024 年 10 月 1 日
Discourse AI plugin with self hosted discourse site Support ai	2	207	2024 年 7 月 9 日
Discourse AI Plugin official , included-in-core , ai	89	37628	2025 年 10 月 14 日
Introducing Discourse AI Blog	26	3630	2023 年 5 月 4 日
I want to install Discourse AI on Discourse Installation ai	13	480	2024 年 6 月 18 日

Discourse AI - セルフホステッドガイド

関連トピック