CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

(probus) #1

While embedding comments from a Ghost blog with RSS, If the blog post title contains non English characters, the embed fails to create a comment topic. I can find this from sidekiq:


Jobs::HandledExceptionWrapper: Wrapped Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

(Jens Maier) #2

Appears to be related to ISO-8859-1 error in static embedding and A bunch of errors in the log…caused by Discourse embedding on a static HTML site?… but those topics look abandoned. :frowning:

(Thomas Purchas) #3

Yeah, Discourse appears to only support UTF-8 for RSS feeds and scraping. Anything else and it throws an error.

It could be that Ghost is creating a RSS feed with an ASCII encoding which upsets Discourse.

On my site I resorted to detecting Discourse and converting the sites output to UTF8. It’s a kludge, but it works.

Another option would be to submit a PR to get Discourse to convert non UTF8 stuff to UTF8 stuff before it tries any of its regex stuff.

(probus) #4

I believe Ghost html and rss are both utf-8 encoded. With my limited skills I can check the source for both rss and html and it begins like this for rss:

<?xml version="1.0" encoding="UTF-8"?>

and html

<meta charset="utf-8" />

Anyway, I tested this with rss feed polling disabled and it started working! So the error has something to do with RSS.

The only downside is, of course, that the comment topic gets created only after the first visit to the blog post. This is usually the case anyway, since the feed gets polled hourly. Disabling feed polling seems to also fix the bugs in whitelist/blacklist found in this topic.

(Thomas Purchas) #5

Interesting. Could you provide a link to a broken post and your RSS feed?

(probus) #6

Sorry, not yet as this site is still in development.

(Thomas Purchas) #7

This article suggests that there may be an invalid UTF-8 character in your RSS feed, and that is causing Ruby to use the ASCII-8BIT encoding.

Not sure it gets us much closer to the answer, but its a start.