What's the best way to build a pipeline to post scraped data into Discourse categories with custom tab-based themes?

We want to build an automated pipeline that that can update the discourse custom theme what we built which has different tabs for different kind of scrapped data content:

  1. Scrapes content from sources (RSS feeds, websites, etc.)
  2. Structures the data with metadata: title, source, type (news/conferences) , URL, date
  3. Uses the Discourse API to:
  • Create a topic under the correct category and update content within specific tabs of custom theme.
  • Add relevant tags (to make it appear under the correct tab)

what are best way to store scrap data and render:

  1. local database or external CMS to store and schedule content or
  2. YAML/JSON Files (Static Source)
1 Like

Maybe have a look at RSS Polling

Thanks pfaffman for the plugin suggestions. However, we don’t have rss feed data, we are storing scrapped data into standalone database..can we use this plugin to connect to the standalone database and fetch the needed data and render the content

It was an example. You could either make your scaped data into an rss feed or modify the plugin to read whatever format you want to put it in.

What I would probably do is write the scraper in ruby and integrate it into a plugin.

Or maybe Use the Discourse API ruby gem and put it in a Github action and have it push the data. I’m planning to do that for a client that’s hosted and can’t use a custom plugin.

Thanks Jay. Already scrapper development is completed with python …now we were evaluating how to render this scrapped data which is stored in monodb.