Also worth noting … majority of coding agents these days don’t even bother with an accurate tokenizer like Discourse does. They just estimate at 4 letters per token.
cl100k will be plenty fine for the vast majority of use cases on llms with slightly different tokenizers.