I would like to open a new thread to continue the discussion here about an importer for Google Groups. In short, I have a simple python scripts based on Scrapy that scraps all messages of a group to a JSON file. I hope someone who knows Ruby and Discourse API could make it into a real importer (json-discourse importer).
That prior discussion mentioned an importer too. But it seems to me that it can not scroll down to the bottom of a page to load complete messages. It sounds like that one doesn’t really work. I report here that my script can follow links like “more topics” until there is no more, thus scraping all messages.
Another trick is that I found the Google Group url has the format of
range. Not sure if anyone noticed, but that basically works pretty well for me. For example, you can use
to get a range of topics from index 10 to index 20. Therefore you have a way to iterate all topics in a group.
Anyway, I would like to share my scripts as a starting point. I hope someone who has experience with Discourse can pick it up and make it one-click to import Google Groups to Discourse.
The Github Repo is here: GitHub - steinwaywhw/google-group-exporter
There is a python script, depends on scrapy.org
<- it should be a clickable link, but I’m a new user, I can’ put more than two links in a post …
There is an example Google Group topic page as seen by the scraper
There is an example output from my scripts.
BTW, “New users can only have two links in a post” - why is that? it’s annoying.