Use Mistral for embeddings

We’ve ran into another issue using Mistral for embeddings.

  • Per this topic use OpenAI as provider and the Mistral service URL as URL
  • Select tokenizer, sequence length and distance function
  • Set model name to ‘mistral-embed’

When a dimension is entered, Mistral cries about not supporting that

 Trying to contact the model returned this error: {{
   "object":"error",
   "message":{
      "detail":[
         {
            "type":"extra_forbidden",
            "loc":[
               "body",
               "dimensions"
            ],
            "msg":"Extra inputs are not permitted",
            "input":2000
         }
      ]
   },
   "type":"invalid_request_error",
   "param":null,
   "code":null,
   "raw_status_code":422
}

This is because Mistral calls this output_dimension so it is not completely OpenAI compatible.

When I leave out the dimensions parameter, “Run Test” works, but it also prevents me from saving the model, telling me that “dimensions” is a required parameter.

Being able to use Mistral is quite crucial for GDPR compliance, so it would be good if the dimensions parameter could be omitted (easy fix) or when Mistral could be a first-class provider (better).

4 Likes

It passes the test for me with this configuration, that I did following their documentation

That said, I’d recommend using a model that scores better, like Qwen 3 embedding model, and the myriad of fine-tunes from it.

It ain’t the only GDPR compliant game in town, albeit it may be the first one that comes to people minds?

OpenRouter has a list Models | OpenRouter, and people can also self-host their embeddings model, it is very doable, and much easier than self-hosting LLMs.

3 Likes