Parsing RSS feed missing quotes + apostrophes

Hi all!

We’ve recently upgraded to 2.0.0.beta10~git49.9f422c93f6 and our latest RSS pull for comment embedding nuked apostrophes and quotes from the posts:

The image example is from this page:

The feed it’s pulling from is here: PIXLS.US

Search on “an about page and help” to get to the relevant section.

y u hate typographical marks?! :smiley:

3 Likes

@techAPJ didn’t you make changes to the feed poller recently?

3 Likes

So this seems to be 2 separate issues:

  1. when upgrading from 2.0.0.beta9+git0 to the version mentioned above discourse decided it needs to refetch/rerender a lot of older posts from the RSS feed. This is how we noticed the 2nd bug
  2. It seems we lost at least all typography markers. also in our recent posts. The import category can be seen here discuss.pixls.us - Free Software Photography

It smells a bit like “RSS feed despite being sent with the correct headers is not seen/read as utf-8 encoded string so that the reencoding with the replace option strips utf-8 encoded chars.”

1 Like

Thanks for bringing this to our notice. I have reverted the UTF-8 encode related changes I pushed few days ago.

Updating to latest version will normalize the behaviour.

Since the content was updated/changed the topic got updated as per this code. Now that I reverted the code, the topic will be updated again with proper formatting.

ok I patched out the “not recently polled” check for a moment. triggered the sidekig job and all our posts are good again.

Why was this reencode added in the beginning? I will debug later why the raw_feed string isnt utf-8 encoded. then the reencode should have been a noop no?

Because in some cases of bad (not supported) encoding we were seeing job exceptions in error logs.

Yes, if the string is proper UTF-8 encoded already, then the encode logic shouldn’t have come into action.

1 Like

might depend on the locale/lang environment of the sidekiq job. maybe i should add LC_ALL=en_US.UTF-8 and LANG=en_US.UTF-8 in my service file.

@gerhard sent a PR with proper fix:

https://github.com/discourse/discourse/pull/5893

Closing this topic for now. Please create a new topic if the problem persists.

3 Likes