How to use AI models with custom tokenizers

sam · March 2, 2026, 4:10am

Also worth noting … majority of coding agents these days don’t even bother with an accurate tokenizer like Discourse does. They just estimate at 4 letters per token.

cl100k will be plenty fine for the vast majority of use cases on llms with slightly different tokenizers.

Topic		Replies	Views
Adding Semantic Search feature for our self-hosted discourse site Support ai , ai-search	9	253	March 19, 2025
Frustrations on AI spam detector Support spam , ai	9	137	December 21, 2025
Configuring OpenRouter language models Integrations ai	0	1027	December 10, 2024
How to implement Mistral with Embeddings Support related-topics , ai	6	227	May 11, 2025
Inquiry About AI Plugin Options Support ai	7	133	November 24, 2025

How to use AI models with custom tokenizers

Related topics