Esclarecimento de Configuração de Embedding da API Gemini

@Falco 2 clarification regarding embeddings:

  1. What does Sequence length refer to in the embedding configuration? Is that related to the Input token limit as described here: https://ai.google.dev/gemini-api/docs/embeddings#model-versions
  2. How does one limit the rate of the embedding API’s? I’ve had to reset the embedding model due to the deprecation of the old model from Gemini so now it’s trying to generate new vectors for the entire forum (if I understood your other post correctly). The problem is that it’s doing it way too fast and it’s hitting 429 too many request rejections from Gemini. Is there a way to throttle it? I’m within the RPD/TPM limits but the Gemini dashboard is showing that discourse is hitting the API way too many times. Would appreciate any advice you may have here (everything was working fine until I had to create a new embedding model due the deprecation of Gemini’s old model).

All well within rate limits:

but getting a lot of 429 (too many request) errors:

Yes, it is 2048 for that specific model, but you can configure it to a lower value to err on the side of caution since the Gemini API lacks an auto truncate parameter.

Hidden site setting named ai_embeddings_backfill_batch_size. Try setting it to 50 if your API provider can’t handle our defaults.

1 curtida

Tx. It was set to 50 still getting 1000’s of errors. I’m going to try lowering it to 20 and see how it goes.
Maybe consider adding the ai_embeddings_backfill_batch_size to the embedding configuration UX screen as this may affect a lot users who are using Gemini basic plans for small sites (and possibly other providers).

On a side note, this appears to be the batch size, how many requests in a single call. Perhaps the issue is that number of requests being made per minute (not per batch). Is there a way to throttle how many backfill requests are sent per minute or per hour?