Embedding of post is not being properly truncated in discourse-ai plugin

When I use the OpenAI ADA text embedding model for related posts matching, I get the following error:

OpenAI Embeddings failed with status: 400 body: { "error": { "message": "This model's maximum context length is 8191 tokens, however you requested 19370 tokens (19370 in your prompt; 0 for the c
/var/www/discourse/plugins/discourse-ai/lib/shared/inference/openai_embeddings.rb:24:in `perform!'  
/var/www/discourse/plugins/discourse-ai/lib/modules/embeddings/models/text_embedding_ada_002.rb:37:in `generate_embeddings'  
/var/www/discourse/plugins/discourse-ai/lib/modules/embeddings/manager.rb:22:in `generate!'  
/var/www/discourse/plugins/discourse-ai/lib/modules/embeddings/jobs/regular/generate_embeddings.rb:14:in `execute'  
/var/www/discourse/app/jobs/base.rb:292:in `block (2 levels) in perform'  
rails_multisite-5.0.0/lib/rails_multisite/connection_management.rb:82:in `with_connection'
/var/www/discourse/app/jobs/base.rb:279:in `block in perform'  
/var/www/discourse/app/jobs/base.rb:275:in `each'  
/var/www/discourse/app/jobs/base.rb:275:in `perform'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:202:in `execute_job'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:170:in `block (2 levels) in process'  
sidekiq-6.5.9/lib/sidekiq/middleware/chain.rb:177:in `block in invoke'  
/var/www/discourse/lib/sidekiq/pausable.rb:134:in `call'  
sidekiq-6.5.9/lib/sidekiq/middleware/chain.rb:179:in `block in invoke'  
sidekiq-6.5.9/lib/sidekiq/middleware/chain.rb:182:in `invoke'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:169:in `block in process'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'  
sidekiq-6.5.9/lib/sidekiq/job_retry.rb:113:in `local'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'  
sidekiq-6.5.9/lib/sidekiq.rb:44:in `block in <module:Sidekiq>'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:263:in `stats'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'  
sidekiq-6.5.9/lib/sidekiq/job_logger.rb:13:in `call'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'  
sidekiq-6.5.9/lib/sidekiq/job_retry.rb:80:in `global'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:124:in `block in dispatch'  
sidekiq-6.5.9/lib/sidekiq/job_logger.rb:39:in `prepare'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:123:in `dispatch'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:168:in `process'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:78:in `process_one'  
sidekiq-6.5.9/lib/sidekiq/processor.rb:68:in `run'  
sidekiq-6.5.9/lib/sidekiq/component.rb:8:in `watchdog'  
sidekiq-6.5.9/lib/sidekiq/component.rb:17:in `block in safe_thread'

Maybe the post needs to be truncated.

4 Likes

Thanks for reporting, we will have a look.

Luckily we have a method that can truncate a collection of words up to a very specific token count.

4 Likes

Nice! Thank you.

1 Like

I think we now have this fixed per:

6 Likes

Got it. :+1: :muscle:

3 Likes

This topic was automatically closed after 3 days. New replies are no longer allowed.