We had to limit max_tokens as our usage of structured outputs meant that many smaller models could easily spiral down into infinite loops during translations.
I believe the newer version of OpenAI Responses API applies max_tokens without counting the thinking tokens, which solves this issue.