Advice on how to build a scraping / posting plugin

I’m looking to build a plugin that periodically searches the web / Twitter / Facebook / Instagram / etc for relevant news and then creates summary posts in Discourse.

Can anyone point me at a plugin or a part of core that I could look at to get some ideas of how to approach this project?

More details…

Our forum is focussed on a UK football team and these news items would relate to the team and players.

The news might be used to start discussion threads or more often simply collected as a news-aggregator and placed into searchable news repository category.

During matches the frequency of the posts may increase dramatically.

I’m at the planning stage at the moment, but right now, outside of some system settings, I don’t believe I will need any front-end UI. So this will all be back-end Ruby.

What I really need to understand is how I can kickoff (cron-like) server jobs - probably with sidekiq.

I’ve written an MVCForum -> Discourse migrator so I’m relatively comfortable moving the gathered news items into posts.

I’ve also written the scraper / Twitter gatherer before for our previous forum (in Python that time), and I used nokogiri as part of the migration so I think I’ll be OK writing the scraper.

I’m guessing that Discourse uses an existing Twitter Gem that I can piggy-back on. If not can you recommend one?

I’ve read the plugins tutorial but outside of the basic structure it didn’t really help as it (understandably) had more focus on front-end UI.

My goal is to parameterise the scraping / CSS selectors as much as possible so that the plugin might have use for the wider community. Same for the Twitter accounts / #hashtags that I want to monitor and create posts from.

Any other advice, tips or pointers gratefully received.

2 Likes

Have you considered not writing a plugin here?

You could just run a program in whatever language you like on a regular schedule that used the Discourse API to create the posts.

7 Likes

Absolutely.

I effectively have the core already in Python. I initially tried to migrate via the API so I’m familiar with that too, but I want to write some Discourse plugins over time and saw this as a gentle introductory project to learn about plugin development.

Discourse has a feature that allows you to poll RSS feeds and create topics for them. I guess that’s similar to what you want to do. At least it’s a good starting point.

https://github.com/discourse/discourse/blob/master/app/jobs/scheduled/poll_feed.rb

5 Likes

Excellent.

That’s exactly the sort of pointer I was looking for.

Thanks!

3 Likes

Are the settings mentioned in the code not generally exposed - as I can’t find them in the UI?

Do I need to enable a flag that will then make the settings visible, or am I heading to the Rails console?

Sorry if I’m being thick.

Go to Customize -> Embedding in the admin interface and click on the “Add Host” button.

4 Likes

Gotcha!

I found that, searched on here and found that it was to do with embedding Discourse in an <iframe></iframe>.

I didn’t go through the creation process and now I see there are more settings.

Thanks again.