Are you confusing request tokens with response tokens?
413 means that your request was too large, not your requested response.
To handle that you want to tweak the Context window LLM configuration, but I’d warn that 8k tokens is way too small nowadays. It will work for some features, but it’s not exactly something we exercize much nowadays when LLMs are handling 1M token long context windows. I can run a 256k context window on my desktop PC using a model that is much better than the one you are using.