Simple-rss is buggy and should be forked or replaced


(Darix) #1

While working on pixls.us, we were fighting for a quite a while to get the RSS parsing working. It turned out to be a bug on the server side (wrong Content-Type header without encoding) and a bug in simple-rss. With the first chunk from simple-rss PR#17, we finally got rss parsing working. Given the inactivity of upstream I would say either fork it or move to a different parser. I had a proof of concept implementation working with the feed_parser gem.

diff --git a/Gemfile b/Gemfile
index 71286ae..03b5efb 100644
--- a/Gemfile
+++ b/Gemfile
@@ -239,7 +239,7 @@ gem 'rbtrace', require: false, platform: :mri
 #
 gem 'ruby-readability', require: false
 
-gem 'simple-rss', require: false
+gem 'feed_parser', require: false
 
 # TODO mri_22 should be here, but bundler was real slow to pick it up
 # not even in production bundler yet, monkey patching it in feels bad
diff --git a/app/jobs/scheduled/poll_feed.rb b/app/jobs/scheduled/poll_feed.rb
index 8c0c8eb..752df2e 100644
--- a/app/jobs/scheduled/poll_feed.rb
+++ b/app/jobs/scheduled/poll_feed.rb
@@ -41,12 +41,12 @@ module Jobs
     end
 
     class Feed
-      require 'simple-rss'
+      require 'feed_parser'
 
       if SiteSetting.embed_username_key_from_feed.present?
-        SimpleRSS.item_tags << SiteSetting.embed_username_key_from_feed.to_sym
+       # TODO
+       Rails.logger.warn "SiteSetting.embed_username_key_from_feed is not implemented yet"
       end
-
       def initialize
         @feed_url = SiteSetting.feed_polling_url
         @feed_url = "http://#{@feed_url}" if @feed_url !~ /^https?\:\/\//
@@ -66,7 +66,7 @@ module Jobs
       private
 
       def rss
-        SimpleRSS.parse open(@feed_url)
+        FeedParser.new(:url => @feed_url).parse
       end
 
     end
@@ -86,7 +86,11 @@ module Jobs
       end
 
       def content
-        @article_rss_item.content || @article_rss_item.description
+        if @article_rss_item.content && not(@article_rss_item.content.empty?)
+          return @article_rss_item.content
+        else
+          return @article_rss_item.description
+        end
       end
 
       def title

The patch is not 100% complete yet as picking the author from the feed part is not implemented yet.

I picked it over feedjira because of the smaller dependency set. From the activity on the project, the best joice might actually be feedjira though.

thoughts?

p.s.: just noticed one bug is even opened by @sam


(Jeff Atwood) #2

Perhaps @eviltrout should have a look.


(Robin Ward) #3

If it works better I’d gladly accept a PR to do it. It looks like it’s most of the way there too?


(Darix) #4

most. although that “pick author from feed” is not really trivial to solve. and the project is almost as dead as simple-rss.

Before we continue … how do you feel about feedjira and the dependencies it pulls?


(Robin Ward) #5

I don’t have any objections really – is the concern that it requires 3 gems?


(Darix) #6

ok then i will rather take a poke at solving the problem with feedjira and do a PR for that. (after i sorted out the CLA thing)


(Darix) #7

can you check with your team if the license terms for curb are ok?


(Darix) #8

for the author tag feedjira already supports dc:creator and author. also some itunes_author for itunes RSS. does that cover most cases or should we support more?


(Jeff Atwood) #9

I know @sam has objected to gem requirement explosion in the past so you should be careful here.


(Darix) #10

can we get a vote from @sam on the issue?


(Sam Saffron) #11

feedjira seems like a mountain of dependencies, can simplerss just be forked and fixed?


(Darix) #12

it really only adds feedjira, curb, loofah and sax-machine to the list. the rest is already there (e.g. nokogiri). Applying that 1 PR i mentioned above fixed at least the pixls.us issue. but the upstream project is still dead.


(Sam Saffron) #13

That is 4 extra dependencies, not easy with that … why not fork simple rss or contact the author and ask them what is going on? and perhaps get ownership of it ?


(Darix) #14

i will ping him on github again, but i really dont want to take the lib.


(Darix) #15

just for the record:

https://github.com/darix/discourse/commit/bc086e291a9c152e91b948848a3d56df693af3db


(Darix) #16

mail send to the author of simple-rss


(Darix) #17

a small update, no reaction so far.


(5minpause) #18

Hi there.
Are there still no news? I run a Discourse Installation for a German website and there are lots and lots of errors regarding German Umlauts. If my debugging is correct SimpleRss is mostly to be blamed for this.
I am eagerly waiting for a replacement. I don’t want to go the route of forking Discourse and setting up all the Docker things myself.


(Darix) #19

your forum is pulling RSS? or just embedding sites?


(5minpause) #20

I use an xml RSS Feed for embedding.