如何将AI模型与自定义分词器一起使用

sam · 2026 年3 月 2 日 04:10

还值得注意的是……如今大多数编码代理甚至懒得使用像 Discourse 那样准确的分词器。它们只是估计每 4 个字母为一个 token。

对于具有略微不同分词器的 llm（大型语言模型）的大多数用例来说，cl100k 将绰绰有余。

话题		回复	浏览量
Adding Semantic Search feature for our self-hosted discourse site Support ai , ai-search	9	333	2025 年3 月 19 日
Frustrations on AI spam detector Support ai , spam	8	202	2025 年11 月 21 日
AI exceeds LLM token thresholds randomly and unpredictably Support ai	3	133	2026 年5 月 6 日
Configuring OpenRouter language models Integrations ai	0	1544	2024 年12 月 10 日
How to implement Mistral with Embeddings Support ai , related-topics	5	341	2025 年4 月 11 日