Gemini API Embedding Configuration Clarification

RBoy · 15 oktober 2025 om 00:40

@Falco 2 clarification regarding embeddings:

What does Sequence length refer to in the embedding configuration? Is that related to the Input token limit as described here: https://ai.google.dev/gemini-api/docs/embeddings#model-versions
How does one limit the rate of the embedding API’s? I’ve had to reset the embedding model due to the deprecation of the old model from Gemini so now it’s trying to generate new vectors for the entire forum (if I understood your other post correctly). The problem is that it’s doing it way too fast and it’s hitting 429 too many request rejections from Gemini. Is there a way to throttle it? I’m within the RPD/TPM limits but the Gemini dashboard is showing that discourse is hitting the API way too many times. Would appreciate any advice you may have here (everything was working fine until I had to create a new embedding model due the deprecation of Gemini’s old model).

All well within rate limits:

but getting a lot of 429 (too many request) errors:

Falco · 15 oktober 2025 om 15:36

Yes, it is 2048 for that specific model, but you can configure it to a lower value to err on the side of caution since the Gemini API lacks an auto truncate parameter.

Hidden site setting named ai_embeddings_backfill_batch_size. Try setting it to 50 if your API provider can’t handle our defaults.

RBoy · 16 oktober 2025 om 02:36

Tx. It was set to 50 still getting 1000’s of errors. I’m going to try lowering it to 20 and see how it goes.
Maybe consider adding the ai_embeddings_backfill_batch_size to the embedding configuration UX screen as this may affect a lot users who are using Gemini basic plans for small sites (and possibly other providers).

RBoy · 16 oktober 2025 om 03:05

On a side note, this appears to be the batch size, how many requests in a single call. Perhaps the issue is that number of requests being made per minute (not per batch). Is there a way to throttle how many backfill requests are sent per minute or per hour?

Also found this if it helps other users, the new gemini embedding is having issues with limits set to 0 if exceeded. There’s a temp workaround to using text embedding instead or maybe just wait for a bit and see if resolves. Having said that I still think it’s a good idea for discourse to add an option to limit the number of API calls per minute for backfills to avoid this problem in the first place.

PS: SUPER COOL to see google also using discourse - wonder what AI they use to power their forum search

tobiaseigen · 15 november 2025 om 03:06

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Antwoorden	Weergaven
"Net::HTTPBadResponse" errors on Gemini Embeddings Bug ai , related-topics	14	489	29 februari 2024
Gemini Embeddings Issue After Discourse Update to 3.6.0 Beta 2 Support ai	4	77	14 oktober 2025
Gemini embedding setting not passing output_dimensionality? Support embedding , ai	4	111	7 november 2025
Ai:embeddings:backfill - Handling OpenAI's 400 Error for Excessive Tokens in Embeddings Bug ai	10	870	15 maart 2024
Warning of embedding `input must have less than 8192 tokens` with discourse ai Support ai	5	87	3 november 2025

Gemini API Embedding Configuration Clarification

Gerelateerde topics