Thank you for providing this guide and import script! I have used it successfully with a google group, using google takeout. I just put the .mbox file in the right directory and ran the script.
I did have a question about importing emails which have parents which are not in the .mbox. For example, there are many threads in our group which are started from a FWD of an email that wasn’t sent to the group, or by adding the group to the reply list in the middle of a conversation to loop them in.
Currently, when importing it seems as if these previous emails are not present. You can find them, if you click on the email icon and view the HTML. I was curious if anyone else encountered this same situation and had any solutions for it. I could imagine either including the previous email chain in the post or trying to parse it and extract a number of messages from it and adding all of those.
This is really excellent. But I have some issues with some emails coming into Discourse with an initial email and then the mbox format replies in the same post, not formatted. I’m not sure what is causing this.
The question is, how can I delete all the imported mails (20 years worth) without deleting and recreating the target discourse instance?
I’m aware the recommended RAM requirement is 8GB. I did try importing 20 years of posts on a 2GB virtual machine and it ran for a while and crashed with the message ‘killed’. 8GB machines on hosting providers such as DigitalOcean are expensive (for me). Is there any way to do this with less memory? Import in smaller batches perhaps?
I know there is not much activity on this thread, but I can’t succeed in getting it working properly. Many of the mbox format emails I import are not split properly. The From lines look like this:
From MAILER-DAEMON Tue Nov 01 05:57:09 2022
But some messages have a correct import then in the same body have raw mbox format items starting with the typical From line. In other words, they are not being split. I don’t see that I need to modify the regex that does the splitting and I don’t know ruby so I can’t debug the import script.
I don’t know where to go from here. There’s 20 years of messages to import, so I can’t go through the imported messages by hand to fix them up. In short this script is not working for me. Why would I be the only one this happens to?
I want to import 20 years of messages from my mailman2 system into an archive directory, but I don’t want to create user IDs (not even staged ones) for them, as many of our subscribers have moved on or passed on and it would create many accounts that will just take up space.
Can I import them all under the same user ID (perhaps ‘archive’)?
And this may be a dumb question, but since the app is turned off during the import process, does that mean users who have signed up for emails about new posts won’t get flooded with emails about all the archives that were just loaded?
You can comment out the import_users function and all messages will be owned by system.
You’re not going to save much space.
No users will receive email until they have used the forgot password process to log in to their account. If you’re importing these data into an existing community then I believe that users will get notifications about the new messsages that are created by the import script.