So I rebaked my posts last night after setting up a cdn, and everything went fine except for one Jobs::ProcessPost
job in sidekiq that keeps failing and retrying. Specifically it fails with a
Wrapped ArgumentError: Attributes per element limit exceeded
error. The backtrace from /logs is as follows…
Message (13 copies reported)
Job exception: Attributes per element limit exceeded
Backtrace
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/nokogiri-1.13.8-x86_64-linux/lib/nokogiri/html5/document.rb:85:in `parse' /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/nokogiri-1.13.8-x86_64-linux/lib/nokogiri/html5/document.rb:85:in `do_parse' /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/nokogiri-1.13.8-x86_64-linux/lib/nokogiri/html5/document.rb:43:in `parse' /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/nokogiri-1.13.8-x86_64-linux/lib/nokogiri/html5.rb:31:in `HTML5' /var/www/discourse/lib/retrieve_title.rb:21:in `extract_title' /var/www/discourse/lib/retrieve_title.rb:91:in `block in fetch_title' /var/www/discourse/lib/final_destination.rb:499:in `block (4 levels) in safe_get' /var/www/discourse/lib/final_destination.rb:496:in `catch' /var/www/discourse/lib/final_destination.rb:496:in `block (3 levels) in safe_get' /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/net-protocol-0.1.3/lib/net/protocol.rb:498:in `call_block'
Hmm, I used the copy button on the logs page, which seems to have truncated the backtrace. If you do need/want more I can get it for you. What I can say is that it gets into the cp.post_process
cp = CookedPostProcessor.new(post, args)
cp.post_process(new_post: args[:new_post])
within the execute
method of ProcessPost
, and from there seems to move into a post_process_oneboxes
.
Now, I investigated in more depth, and managed to track down the specific post using the rails console. It contains a single url, no images, no edit history.
Okay, and having investigated the url in question some more, it’s complete and utter blurgh. It’s a blog post, where they’ve copied the entire post verbatim into both the og:description and twitter:description meta tags. But, worse than that, and what I assume is causing the issue, the text includes some " characters that they haven’t escaped, therefore completely screwing up both of these tags, and making most of the text look like attributes.
So the contents of the specific url are complete garbage. I guess it might be better for things like this to be picked up specifically, instead of repeatedly crashing the sidekiq job, not sure if that’s something you might want to look into?
Otherwise I guess that really only leaves a couple of questions from me. First, will it give up trying to run this job eventually or is there something I need to do? Second, is there anything that will get messed up with the job not managing to run successfully, only having some text and this one url?