Gemini API Embedding Configuration Clarification

@Falco 2 clarification regarding embeddings:

  1. What does Sequence length refer to in the embedding configuration? Is that related to the Input token limit as described here: https://ai.google.dev/gemini-api/docs/embeddings#model-versions
  2. How does one limit the rate of the embedding API’s? I’ve had to reset the embedding model due to the deprecation of the old model from Gemini so now it’s trying to generate new vectors for the entire forum (if I understood your other post correctly). The problem is that it’s doing it way too fast and it’s hitting 429 too many request rejections from Gemini. Is there a way to throttle it? I’m within the RPD/TPM limits but the Gemini dashboard is showing that discourse is hitting the API way too many times. Would appreciate any advice you may have here (everything was working fine until I had to create a new embedding model due the deprecation of Gemini’s old model).

All well within rate limits:

but getting a lot of 429 (too many request) errors:

Yes, it is 2048 for that specific model, but you can configure it to a lower value to err on the side of caution since the Gemini API lacks an auto truncate parameter.

Hidden site setting named ai_embeddings_backfill_batch_size. Try setting it to 50 if your API provider can’t handle our defaults.