Importing / migrating mailing lists (mbox, Listserv, Google Groups, emails, ...)

Well, that’s a massive bummer!

I thought Google hadn’t touched it since February 2015, but I stand corrected.

Looks like there was a complete redesign in 2020 and a logo touch-up in 2021.

I am thinking of using the APIs of GMAIL and Discourse and convert mail thread to discourse post. I have gone through the google apis and was able to get the emails but have a few questions about them.

  1. Mail can be downloaded in raw format which is the base64 encoded value of “Original message” of mail. Is that the same as mbox format or is it different?
  2. Is there any sample example of how to add posts and attachments to discourse through API?

Interesting. Is it possible to use the Gmail API to access emails from Google Groups or do you simply have a Gmail account that has all the emails that were sent to the group?

I suggest you save all message into individual *.eml files. You will need to decode the message before saving if the whole message (including the email headers) is base64 encoded. Afterwards follow the steps from Importing / migrating mailing lists (mbox, Listserv, Google Groups, emails, ...) (minus the Google Groups steps). The import script will take care of posts, attachments and a lot more.

1 Like

I am a member of the group so I am able to pull the mails using my email. The entire email message including headers will be part of the encoded string.

Will try your approach for import into discourse at least for a thread.

1 Like

Well, in that case you might not even need the Gmail API. Connecting an email client like Thunderbird to your Gmail account and exporting individual emails or an mbox file should be enough…

Now I’d really like to know how this works. I was under the impression that Google Groups doesn’t support NNTP.

1 Like

It’s not NNTP but Rest API.

3 Likes

My mailbox size is more than 200GB and to get specific group-related emails I might need to download all these mails using thunderbird also thunderbird is not showing the google groups all mails are under Inbox only. It is also downloading only 200 emails each time. So, I am not sure how long it could take to get mails etc.

Is there any alternative way to just get a google group data and export it to mbox?

Hi Gerhard Schlager,

We are trying to migrate our google groups to Discourse we followed all the above steps as per the document but it only creates the category in Discourse and not importing the data. Would really appreciate if you can respond on this one quickly.

The only way that we knew about doesn’t work anymore. If you know any way to get the data, then you should do that. If you know a way to get it, you should probably start getting it asap before that method too goes away.

If it’s in your mailbox then it might be possible to use that Gmail api to pull it down. It’ll be tricky though, as a developer would need access to a mailbox with google group data in it to write the code.

Unless an enterprise customer who requires it, I doubt that cdck (aka discourse.org) will be writing that code any time soon. You can ask in #marketplace. I likely wouldn’t consider it for under $2000 and, given the frustration that I have had with the Google groups import script on the past, would likely require $5000. Or course, someone else may have better skills or more patience.

One approach that may work is using Integromat to write a conversion assuming google groups and discourse apps are available. Or using the http calls to call the rest APIs on their own.

Integromat is an integration data migration system. It’s very powerful and can do a lot with little to no coding.

I have tried the google takeout by making myself the owner of our google workspace group and able to download the google group conversations. Still working on importing it though.

A couple of downsides with this approach.

  1. Need to download full data again if needed to do an incremental update.
  2. Not possible to download selective group data instead this will download data for all groups for which the user has owner or manager permissions.
  3. Need to work with the google workspace admin to enable the takeout as by default it is disabled.
3 Likes

Hi @Anjana_Raghavendra_P - did you manage to do a simple import using this approach?

Thanks very much!

Yes, I am able to download the mbox file from takeout and imported using the steps mentioned in the original post.

Later as we are using the PAAS service of Discourse provided the file to the discourse technical team who were able to import the content into the discourse platform.

2 Likes

I’m happy to hear that - thanks!

I encountered @sturdy2’s issue when in settings.yml, I changed the first line data_dir: /shared/import/data, from its default.

Take home message: don’t change it as it refers to the path inside the import docker, not on the main machine.