Here is a ‘working prototype’ of a google group migration script that attempts to be more ‘all-in-one’ and reduce the number of steps, complexity and difficulty of doing google group imports.
Having finally gotten around to doing a pro-bono google group import I promised to do for the wonderful Valentina Project, I had to re-familiarise myself with the available google group scraping tools, as suggested to me by @erlend_sh, who pointed me in the direction of a few google group scraping libraries and Discourse’s own mbox importer script. So while it’s all in my head I thought I’d have a go at a more user-friendly import script, and some documentation for it.
- I think I may have solved the problem of email addresses being redacted by Google Groups, (it doesn’t happen if you are logged into Google Groups with a ‘Manager’ level account) so that now, the
mbox.rbimporter can create users in Discourse with the correct email addresses. The rightful owners of those created users would then only need to do a password reset in order to be able to log into their new user on Discourse, and all their Google Group posts would be correctly assigned to them. I’ve tested this on a real migration, it works, and I’m keen to get feedback from other testers.
- I’ve tried to use the OO design pattern established in the other import scripts, for example
ImportScripts::Base. In my case, I wanted to use a lot of functionality from
ImportScripts::Mbox, so my script is a subclass of
- It works… but it’s definitely not finished yet and I’d appreciate constructive criticism, pull requests and amusing emoji.
#How to use it
You will need to be a little bit familiar with the Linux command line, SSH and stuff like that. I’ve tried to make the step-by-step instructions as clear as possible, but there might be slight variations in the output of certain commands. Please reply to the thread if you are having problems.
Cookies. In order to be able to extract users’ email addresses correctly from the Google Group, you will need to have Manager access to the Google Group. Having logged into Google Groups (on your normal computer) with this Manager account, export the Google cookies from your browser. (I used the cookies.txt Chrome extension to get the cookies.txt file (Without this step, the scrape will work BUT the email addresses are truncated/redacted by Google Groups so they look like this:
marcu....@gmail.com, and of course this messes up creation of new users on Discourse)
Upload cookies.txt. Once you have the cookies.txt file, the easiest way to get it into your Docker container from your computer is to upload it as an attachment to any post in your discourse forum. You will need the file path for the next step, you can get the URL from the post, it will be something like:
SSH into your server
$ ssh user@your-discourse-server
Change directory into the Discourse directory
$ cd /var/discourse
Enter the Discourse Docker container, using ./launcher tool.
$ ./launcher enter app
Copy cookies.txt to
/tmp/so that the import script can find it. Prepend
/var/www/discourse/publicto the URL from the previous step, this gives you the full file path, to use with the
cp(unix copy) command:
# cp '/var/www/discourse/public/uploads/default/original/1X/245aa......40b69.txt /tmp/
Install some stuff that’s needed by
mbox.rb, the importer script, for its index
# apt install sqlite3 libsqlite3-dev
# gem install sqlite3
Change into the import scripts directory with the
# cd /var/www/discourse/script/import_scripts
Get the google group script (and the monkeypatched version of
# wget https://raw.githubusercontent.com/pacharanero/discourse/master/script/import_scripts/googlegroups.rb
`# wget https://raw.githubusercontent.com/pacharanero/discourse/master/script/import_scripts/mbox.rb’
Change user to the
discourseuser so that you can make changes to the database
# su discourse
Run the script!
# ruby googlegroups.rb
Any problems, please feel free to discuss in the replies.