This Howto is out-of-date.
Please check out Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc) for updated instructions.
@pacharanero’s fantastic post on Migration of Google Groups to Discourse only works with Discourse v1.7.2, so I wanted to share the steps I followed to import Google Groups into the Discourse v1.8.x (the latest stable version).
If you’ve installed and configured Discourse before and are comfortable on the command line, I hope this guide will be as helpful to you as @pacharanero’s guide has been to me.
If you run into any problems with these steps, just ask below and I’m glad to help. And if you have suggestions on how I can improve the guide, please share so we can make this process easier for everyone!
A few warnings before we start
Moving a community from Google Groups to Discourse is time-consuming, fragile, and annoying. If you can find someone who has done it before to do it for you, I’d strongly recommend that option!
If you’d like to try a move yourself, please read through all the steps first before attempting it. And please try it on a test server before actually moving your community. Again, it’s a fragile process.
The rough steps are to install and configure Discourse, install some prerequisites, scrape all your Google Groups messages (you’ll need to be a “Manager” to do this) into mbox files, then import those mbox files into Discourse.
The scrape of the messages from Google Groups is slow so give yourself a few hours. I’d also recommend you put the Google Group in read-only mode so you don’t miss any new messages while doing the import.
The import is CPU and RAM intensive, I’d recommend using the biggest machine you can find, backup the Discourse data after the import, and then restore it to a production machine.
I’ve had trouble getting this guide to work on recent versions of Discourse v1.9.x, but it does work great on v1.8.x. On your import machine, edit /var/discourse/containers/app.yaml
, set version to version: stable
, do a rebuild, and use that version to be safe.
1. Install and configure Discourse
-
Skip Setup by clicking Maybe Later. Then in Settings…
- Disable emails so you don’t spam anyone with a digest email or notification
- If you do a backup, you won’t get the notification to download
- Grant yourself trust level 4 and moderator
- Disable emails so you don’t spam anyone with a digest email or notification
2. Install prerequisites inside Docker
-
Log into the server. You might want to use mosh/tmux instead of SSH since the process requires many long running tasks.
ssh user@your-discourse-server cd /var/discourse ./launcher enter app
-
Install pngout and pngquant. This is needed to compress images on import.
cd /tmp wget http://static.jonof.id.au/dl/kenutils/pngout-20150319-linux-static.tar.gz tar zxvf pngout-20150319-linux-static.tar.gz cp pngout-20150319-linux-static/i686/pngout-static /usr/local/bin/pngout
apt-get install build-essential libpng16-dev -y git clone --recursive https://github.com/pornel/pngquant.git cd pngquant make && make install
-
Install sqlite3 for mbox import.
apt-get install sqlite3 libsqlite3-dev -y
-
Prep the Gemfile.
cp /var/www/discourse/Gemfile /tmp/Gemfile
-
Add sqlite3 to
/tmp/Gemfile
. The beginning of the file will look like this:source 'https://rubygems.org' # if there is a super emergency and rubygems is playing up, try #source 'http://production.cf.rubygems.org' gem 'sqlite3' # does not install in linux ATM, so hack this for now gem 'bootsnap', require: false
-
Install the needed gems. There’s likely a better way of installing gems in deployment mode, but this is how I did it.
cd /tmp/ bundle install cp /tmp/Gemfile /var/www/discourse/Gemfile cp /tmp/Gemfile.lock /var/www/discourse/Gemfile.lock cd /var/www/discourse/ bundle install
-
Edit
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/discourse_image_optim-0.24.5/lib/image_optim/worker/jhead.rb
to fix compression errors during import. The beginning of the file will look like this:require 'image_optim/worker' require 'exifr/jpeg'
Scrape messages from Google Groups
-
Download @pacharanero’s awesome google_group.to_discourse script.
cd /var/www/discourse/script/import_scripts/ wget https://raw.githubusercontent.com/pacharanero/google-group-to-discourse-migration-script/master/googlegroup.rb
-
Edit
/var/www/discourse/script/import_scripts/googlegroup.rb
to comment out import_to_discourse. The end of the file will look like this:setup scrape_google_group_to_mbox # import_to_discourse
-
To scrape the emails from Google Groups, you will need a Google account with a “Manager” level access to the Google Group.
Use Chrome to log into the Google Group using that manager account. Then with the cookies.txt extension, get a valid cookie file.
SID, HSID, SSID from .google.com are all that is needed, so trim the large cookie file and put the resulting three lines at
/var/www/discourse/script/import_scripts/cookies.txt
. The file will look like this:.google.com TRUE / FALSE 1568431805 HSID gwB8B0z7IH8QPgYVz .google.com TRUE / TRUE 1568431805 SSID MPo7SOfkphRl9uqG0 .google.com TRUE / FALSE 1569505294 SID mEGqexZoGBVnTyO1NgPkdKI3zl10O6MmEGqexZoGBVnTyO1NgPkdKI3zl10O6MmGmDcN3G2
-
Scrape messages from the Google Group. Depending on the size of your group, this step will take a few hours. Since it is a long process, it might be wise to backup the googlegroup-export folder after the scrape has finished.
cd /var/www/discourse/script/import_scripts/ RAILS_ENV=production bundle exec ruby googlegroup.rb my-list /var/www/discourse/script/import_scripts/cookies.txt
-
Setup your data directory for the import.
mkdir -p /var/www/discourse/script/import_scripts/mbox-import cd /var/www/discourse/script/import_scripts/mbox-import
-
Edit
/var/www/discourse/script/import_scripts/mbox/settings.yml
to point to data directory. The file will look like this:data_dir: /var/www/discourse/script/import_scripts/mbox-import default_trust_level: 1 split_regex: "/^From (.*) at/"
-
Move all the
my-list
emails into amy-category
folder.my-category
will be created by the import script automatically.cp -r /var/www/discourse/script/import_scripts/googlegroup-export/my-list/mbox my-category chmod -R 777 /var/www/discourse/script/import_scripts/mbox-import
-
Edit
/var/www/discourse/script/import_scripts/mbox/importer.rb
to make sure users are not staged. The create_users method will look like this:create_users(rows, total: total_count, offset: offset) do |row| { id: row['email'], email: row['email'], name: row['name'], trust_level: @settings.trust_level, staged: false, created_at: to_time(row['date_of_first_message']) } end
3. Import mbox files into Discourse
-
Run the import. Depending on the size of your group, this step may take a few hours. The driver seems to be how many images need to be compressed by pngquant/pngout.
su - discourse cd /var/www/discourse/script/import_scripts/ RAILS_ENV=production bundle exec ruby mbox-experimental.rb mbox/settings.yml
-
That’s it! All the messages from
my-list
should now be inmy-category
!
Frequently asked questions
-
Import is failing on a particular topic, how do I recover?
The import may fail when creating a particular topic. Usually, it’s because there are some HTML or Unicode characters that can’t be parsed. Make a note of the number (e.g., 123) where the import failed, then run
sqlite3 index.db
to get into the import database. A query like this will show you the bad message.SELECT msg_id, from_email, from_name, subject, email_date, attachment_count FROM email WHERE in_reply_to IS NULL ORDER BY DATE(email_date) LIMIT 1 OFFSET 123;
Remove the bad message from
/var/www/discourse/script/import_scripts/mbox-import
, deleteindex.db
, and re-run the import. -
How do I add members who signed up for the Google Group, but didn’t send messages?
The scrape only creates accounts for members who have sent messages to the Google Group.
If you’d like to create accounts for your “silent” members, you’ll need to copy and paste the names, emails, and sign up dates from the Google Group’s member list. You must copy and paste because Export Members doesn’t work on large groups, so scroll to the bottom of the member list and then copy all.
Then clean up the data as you see fit and load data into the users table. The data should look like this:
ssmith@gmail.com, Sally Smith, 2012-05-18T00:00:01-00:00 molly.mathers@gmail.com, Molly Mathers, 2010-01-05T00:00:01-00:00
Finally, edit
/var/www/discourse/script/import_scripts/mbox/importer.rb
to only import users.def execute # index_messages # import_categories import_users # import_posts end
-
How do I update the Google Group scrape with new messages?
Modify the
googlegroup.rb
script so instead of running the wget command, you run the update command. The file will look like this:puts "This stage takes longer than the first pass and can take hours, depending on the size of your Google Group\n\n".blue system './crawler.sh -rss > update.sh' system 'chmod +x ./update.sh' system './update.sh' system "chmod -R 777 #{ENV["_GROUP"]}"