How to implement Mistral with Embeddings

I’ve been struggling to set up Embeddings with Mistral AI, I suspect because Mistral requires a model to be passed. Do you know whether this is possible (and if so, how), or what should be done to make it possible?

2 Likes

Try setting mistral-embed in the “Model name” field, that appears after you select “Provider” as OpenAI.

2 Likes

Thanks, that works :+1:

I’m struggling to find out what would be the best tokenizer to use for this use case though. The Mixtral tokenizer is not selectable here. Do you have any suggestions?

Your post above token length according to some tokenizers:

OpenAI: 45
Mixtral: 52
Gemini: 47
E5: 50
bge-large-en: 49
bge-m3: 50
mpnet: 49

Looks like Mistral-embed doesn’t differ much from the others. And since it supports a very large context window of 8k, you should be safe picking any and leaving some room to spare by limiting the context window in Discourse to 7 or 7.5k.

1 Like

Looks like mistral-embed uses the same tokenizer as the first Mixtral model, and we already ship that anyway, so what do you think about enabling that tokenizer in the embeddings config page @Roman_Rizzi ?

2 Likes

Sure. I don’t see why not if it’s already there. This change will add it to the available options:

2 Likes