A bunch of errors in the log...caused by Discourse embedding on a static HTML site?

(Kirupa Chinnathambi) #1

Hi, everyone!
While trying to diagnose why one of my pages with a Discourse Embed script wasn’t working, I noticed that my error logs have many MANY entries being generated every second:

From looking at one of the 404 Not Found entries, here is what I see:

/usr/local/lib/ruby/2.0.0/open-uri.rb:353:in `open_http'
/usr/local/lib/ruby/2.0.0/open-uri.rb:709:in `buffer_open'
/usr/local/lib/ruby/2.0.0/open-uri.rb:210:in `block in open_loop'
/usr/local/lib/ruby/2.0.0/open-uri.rb:208:in `catch'
/usr/local/lib/ruby/2.0.0/open-uri.rb:208:in `open_loop'
/usr/local/lib/ruby/2.0.0/open-uri.rb:149:in `open_uri'
/usr/local/lib/ruby/2.0.0/open-uri.rb:689:in `open'
/usr/local/lib/ruby/2.0.0/open-uri.rb:34:in `open'
/var/www/discourse/app/models/topic_embed.rb:75:in `find_remote'
/var/www/discourse/app/models/topic_embed.rb:102:in `import_remote'
/var/www/discourse/lib/topic_retriever.rb:59:in `fetch_http'
/var/www/discourse/lib/topic_retriever.rb:46:in `perform_retrieve'
/var/www/discourse/lib/topic_retriever.rb:10:in `retrieve'
/var/www/discourse/app/jobs/regular/retrieve_topic.rb:16:in `execute'
/var/www/discourse/app/jobs/base.rb:154:in `block (2 levels) in perform'

I wasn’t able to find additional information from the logs that helps me go further, but I was hoping this might help one of you to point me in the right direction :smile:


(Mittineague) #2

It looks like you’re embedding something using Windows-1252 character encoding, and because Discourse uses UTF-8 character encoding you’re getting the errors.

Sorry, I have no idea on how to fix that other than possibly converting the Windows-1252 into UTF-8 if you can.

(Kirupa Chinnathambi) #3

Interesting. I did a quick manual scan of random pages on the kirupa.com site, and I couldn’t find any content that wasn’t encoded in UTF8.

Is there a way to figure out what content Discourse has identified as being Windows-1252?

Also, do the log entries get pruned after a certain size? (In other words, should I worry about the log file getting so large, it eats up all available free space on the server?)

(Kane York) #4

What is in the env tab on those errors? Paste in the env for both one that starts with Job exception: Wrapped and one that doesn’t include “Wrapped”.

Hopefully there’s a URL in the args. If not, we should add reporting of it in the code.

(Kirupa Chinnathambi) #5

The env tab contained the information I needed! Thanks :stuck_out_tongue:

Here is the information anyway:

Here is a Job exception: Wrapped

retry: true
queue: default
class: Jobs::RetrieveTopic
jid: e34f7ffe689f224427a7b845
enqueued_at: 1417475985.7480564
error_message: Wrapped Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with Windows-1252 string)
error_class: Jobs::HandledExceptionWrapper
failed_at: 1417475987.366709
retry_count: 0

  0: [object Object]

Here is an exception that doesn’t include Wrapped:

current_db: default
current_hostname: forum.kirupa.com
job: Jobs::RetrieveTopic
problem_db: default

  user_id: null
  embed_url: http://www.kirupa.com/html5/removing_space_between_images_using_css.htm
  current_site_id: default

The HTML file referenced by this last exception was indeed encoded in Windows-1252, so I went ahead and fixed it. The embed works fine now. I’ll go through and make the change on any remaining files as well.