@Falco 2 clarification regarding embeddings:
- What does
Sequence length
refer to in the embedding configuration? Is that related to theInput token limit
as described here: https://ai.google.dev/gemini-api/docs/embeddings#model-versions - How does one limit the rate of the embedding API’s? I’ve had to reset the embedding model due to the deprecation of the old model from Gemini so now it’s trying to generate new vectors for the entire forum (if I understood your other post correctly). The problem is that it’s doing it way too fast and it’s hitting 429 too many request rejections from Gemini. Is there a way to throttle it? I’m within the RPD/TPM limits but the Gemini dashboard is showing that discourse is hitting the API way too many times. Would appreciate any advice you may have here (everything was working fine until I had to create a new embedding model due the deprecation of Gemini’s old model).
All well within rate limits:
but getting a lot of 429 (too many request) errors: