How to use AI models with custom tokenizers

RBoy · December 14, 2025, 12:08pm

Thanks. So I decided to enroll the services of ChatGPT, Gemini and Grok to help me decided which tokenizer to use; which would be the closest match to the Kimi Instruct TikToken/BPE tokenizer to generate the most accurate output from the model.

I must say modern AI models are fairly representative of human society. They all reasoned out which tokenizer would be best suited and presented their findings, they disagreed on some of the facts and they each had their own thoughts on which one is the best - kinda heading in the same direction but not really a consensus, very much like a human project team - hilarious!!!

BTW, Gemini recommended Qwen (for the relationship between the chinese founders), Grok recommended Llama3 (based on it’s similarity with cl100k_base and overall efficiency) whlle ChatGPT said either Qwen or Llama3 -

Topic		Replies	Views
Adding Semantic Search feature for our self-hosted discourse site Support ai , ai-search	9	233	March 19, 2025
Frustrations on AI spam detector Support spam , ai	9	117	December 21, 2025
Configuring OpenRouter language models Integrations ai	0	939	December 10, 2024
How to implement Mistral with Embeddings Support related-topics , ai	6	208	May 11, 2025
Inquiry About AI Plugin Options Support ai	7	107	November 24, 2025

How to use AI models with custom tokenizers

Related topics