How to use AI models with custom tokenizers

sam · March 2, 2026, 4:10am

Also worth noting … majority of coding agents these days don’t even bother with an accurate tokenizer like Discourse does. They just estimate at 4 letters per token.

cl100k will be plenty fine for the vast majority of use cases on llms with slightly different tokenizers.

Topic		Replies	Views
Adding Semantic Search feature for our self-hosted discourse site Support ai , ai-search	9	272	March 19, 2025
Frustrations on AI spam detector Support ai , spam	8	187	November 21, 2025
AI exceeds LLM token thresholds randomly and unpredictably Support ai	3	100	May 6, 2026
Configuring OpenRouter language models Integrations ai	0	1219	December 10, 2024
How to implement Mistral with Embeddings Support ai , related-topics	5	267	April 11, 2025

How to use AI models with custom tokenizers

Related topics