AI-Bildbeschriftungsfunktion im Discourse AI-Plugin

We are running the full model, but the smallest version of it with Mistral 7B. It’s taking 21GB VRAM in our single A100 servers, and it’s ran via ghcr.io/xfalcox/llava:latest container image.

Sadly the ecosystem for multi-modal models ain’t as mature as the text2text ones, so we can’t yet leverage inference servers like vLLM or TGI and are left with those one-off microservices. This may change this year, multimodal is on vLLM roadmap, but until then we can at least test the waters with those services.

5 „Gefällt mir“