How to use AI models with custom tokenizers

Also worth noting … majority of coding agents these days don’t even bother with an accurate tokenizer like Discourse does. They just estimate at 4 letters per token.

cl100k will be plenty fine for the vast majority of use cases on llms with slightly different tokenizers.

3 Likes