Autobot plugin - Duplicate posts problem

When importing via an RSS feed into a category:

https://medium.freecodecamp.org/feed

Each time the source is polled, the same article is pulled in and duplicated as a topic each time.

Here is the logs:

Job exception: undefined method `acting_user=' for nil:NilClass

/var/www/discourse/lib/post_revisor.rb:133:in `revise!'
/var/www/discourse/app/models/post.rb:465:in `revise'
/var/www/discourse/app/jobs/regular/pull_hotlinked_images.rb:112:in `execute'
/var/www/discourse/app/jobs/base.rb:134:in `block (2 levels) in perform'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/rails_multisite-1.1.1/lib/rails_multisite/connection_management.rb:73:in `with_connection'
/var/www/discourse/app/jobs/base.rb:129:in `block in perform'
/var/www/discourse/app/jobs/base.rb:125:in `each'
/var/www/discourse/app/jobs/base.rb:125:in `perform'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:188:in `execute_job'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:170:in `block (2 levels) in process'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:80:in `call'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/middleware/chain.rb:133:in `invoke'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:169:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:141:in `block (6 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/job_retry.rb:97:in `local'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:140:in `block (5 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq.rb:36:in `block in <module:Sidekiq>'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:136:in `block (4 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:204:in `stats'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:131:in `block (3 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/job_logger.rb:7:in `call'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:130:in `block (2 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/job_retry.rb:72:in `global'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:129:in `block in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/logging.rb:44:in `with_context'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/logging.rb:38:in `with_job_hash_context'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:128:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:168:in `process'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:85:in `process_one'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/processor.rb:73:in `run'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/util.rb:16:in `watchdog'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/sidekiq-5.0.5/lib/sidekiq/util.rb:25:in `block in safe_thread'
1 Like

@schungx, Thanks for your PR :+1:. I merged it. It seemed good to me and I didn’t tested well. Let me know if any issues raised.

@kuyashi, Since currently I am working in official plugins I am unable to check your report soon. I will try to have a look at this weekend.

4 Likes

I am wondering, if a post created by the autobot is DELETED, then what happens?

https://github.com/vinkashq/discourse-autobot/blob/master/lib/autobot/post_creator.rb#L111

Notice the use of first here – it finds the first post with the correct feed item url. Why the first one? Shouldn’t it be the LAST one at least? And shouldn’t the list be sorted (say order(:post_id)) before taking first or last so it is at least predictable?

Since the post custom fields are never deleted together with the post, this almost means that the deleted first post will be found every time. Mostly like meaning that try(:post) will return nil and then a new post gets created EVERY TIME the RSS feed is scanned.

This, I believe, is the reason behind all those duplicated posts.

Shouldn’t the correct behavior be to:

  1. First sort the posts by post_id (same as by date)
  2. Filter out all the deleted posts
  3. Take the last one

I would have made a PR if I know how to… but I’m not too comfortable with all those ActiveRecord stuff.

First of all you can’t delete a post completely. For more details Search results for 'how delete post completely' - Discourse Meta

As per this line of code duplicate posts won’t created. So no need for sort. In this case first and last are same.

I guess @kuyashi’s duplicate post problem is special case. I need some time to check that issue.

If you like to contribute then please create PR for the features you already suggested in previous post.

1 Like

It is not just @kuyashi, I am also getting duplicate posts every single time that the RSS feed is scanned. If it is scanned three times, then three duplicated posts.

So this is not an isolated problem.

1 Like

Not so sure about this. For example:

post = get_existing_post || post_creator.create!

However:

def get_existing_post
      return PostCustomField.where(name: "autobot_source_url", value: source_url).first.try(:post)
end

Now, assume there are multiple records in post_custom_fields with the same autobot_source_url, all but the last one of them point to posts that are deleted.

I am not sure if PostCustomField.where(name: "autobot_source_url", value: source_url) will also return all records (included those pointing to deleted posts) or just the live one.

If it returns only the live one, then you’re probably correct that first == last. However, I’m not so sure about that assumption. AFAICT, PostCustomField will return all the records matching the autobot_source_url, including those pointing to deleted posts.

So, when you take the first of that, and access the :post, I’m not sure what will result. My suspicion is that it returns nil, so that the clause falls into post_creator.create! and a new post is created. Every single time, because you’ll always find the deleted post first.

  1. Deleting all the old posts
  2. Changing the RSS feed for the items to have new URL’s
  3. New posts are created
  4. New posts are no longer duplicated as long as none is deleted

Therefore, I’d heavily suspect that it is deleted RSS posts that cause the duplicated posts.

@schungx , I have the exact same issue. Have been following along on this thread, thought i might have an older version, or maybe an issue with the feed and it’s not the case. Every refresh = duplicate post created.

@Jon_Hawkins, make sure you pull the latest one. It should solve a few problems.

On the duplicated posts, how many posts are duplicated each scan of the feed? I find that only posts that were ever deleted before will continue to add new duplicated posts indefinitely. However posts that were never deleted are never duplicated.

Are any of your feed posts deleted?

A good way to find out is to:

  1. Open Data Explorer
  2. select * from post_custom_fields where name='autobot_source_url' and value='your duplicated feed item URL here'
  3. See if how many posts got returned
  4. If > 1, then you have some deleted posts and that’s creating the duplication problem

I think i didi a rebuild of the app 4 days ago and have the latest build. This is my first plugin though, is there anything specific i need to do besides ./launcher rebuild app?

Not that I know of. How many duplicates do you get?

Im using zapier to build a custom RSS feed of only 1 item. When autobot is live on that zapier feed it will create the same post over and over again, as each interval is hit. Let me rebuild the app now and just confirm I have the latest greatest.

  1. Did you ever delete the first post created by that feed?

  2. What happens if you change your feed item’s URL to a different one?

I’m fairly certain I’ve only deleted the first post created by that feed after noticing a duplicate but let me test again and let you know.

I have never changed the feed URL to another feed. I just deleted the entire record and created a new one from scratch. I think I went to edit the existing feed one time and noticed something weird when saving so figured I’d just remove it and start fresh.

If you deleted the first post, then you’ll get duplicated posts forever. You have to change the URL.

Basically the first post is the most important. If it is deleted, a new post will be created during every refresh.

Deleting the campaign won’t help because the plugin won’t distinguish between different campaigns. This is probably also a potential problem, if an item is ever shared between two different feeds in two different topics/categories. They’ll never duplicate in this case.

Source:

https://github.com/vinkashq/discourse-autobot/blob/master/lib/autobot/post_creator.rb#L111

It probably should be something like:

PostCustomField
    .where(name: "autobot_source_url", value: source_url)
    .where(name: "autobot_campaign_id", value: campaign["id"])
    .first.try(:post)
2 Likes

THanks @schungx! This all makes sense based on what I’m experiencing.

1 Like

Sorry currently I don’t have much time to look into this plugin code. I will check it all and will do the improvements as soon as possible. May be in next couple of weeks.

@schungx thanks for your PR. I merged it.

1 Like

Hey, I fixed this problem now. @schungx is correct. Deleted posts are causing the issue. But using the last method not resolving it (anyway as per his wish I added it too :wink:). Instead I used unscoped option to include soft deleted posts while finding records. Anyway thanks for Stephan :+1:.

@kuyashi @Jon_Hawkins

3 Likes

Wahoo! I’ll try it out again tonight. Thanks @vinothkannans

1 Like