Self-Hosting Embeddings for DiscourseAI

A friend just DM’ed me this thread.

Some Pro/Con’s:

  • infinity supports multi-modal embeddings (aka send images/audio) to the
  • amd gpu support
  • multiple models supported in the same container (control the model via model param).
  • more dtypes e.g. int8 quantization of the weights (mostly this is irrelevant, activation memory is larger)
  • new models often come out via “custom modeling code” shipped in the huggingface repo. Infinity reads this pytorch code if needed. This will help you avoid “can you support xyz models” on a ongoing basis)
  • more models supported (e.g. debertav2 for mixedbread)

Cons:

  • cold start time of TEI is better
2 Likes