Okay, bit of an update - we were unable to get direct OpenAI API connection to work on the corporate IP range. Cloudflare would send RST packets about 1ms after TLS.
So we set up a Cloudflare AI Gateway as a URL drop-in replacement for the OpenAI API endpoint and it works flawlessly with the LLM configuration.
Looks like Cloudflare has an undocumented rate limit policy for unknown IP ranges (i.e., not Azure, AWS, GCP, etc) that kicks in. The 100 connection pool for Embeddings would trip that limit.
As an aside, Cloudflare has an Authenticated Gateway feature that adds a special header token.
From their doco:
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
--header 'cf-aig-authorization: Bearer {CF_AIG_TOKEN}' \
--header 'Authorization: Bearer OPENAI_TOKEN' \
--header 'Content-Type: application/json' \
--data '{"model": "gpt-4o" ........
It would be awesome if there was a feature to add per-LLM headers in the LLM configuration screen.
That way we could add the cf-aig-authorization
key and value to the LLM for every call we make.