AI Image Captioning Feature in Discourse AI Plugin

We are running the full model, but the smallest version of it with Mistral 7B. It’s taking 21GB VRAM in our single A100 servers, and it’s ran via ghcr.io/xfalcox/llava:latest container image.

Sadly the ecosystem for multi-modal models ain’t as mature as the text2text ones, so we can’t yet leverage inference servers like vLLM or TGI and are left with those one-off microservices. This may change this year, multimodal is on vLLM roadmap, but until then we can at least test the waters with those services.

5 Likes