I have 28 years of posts, with a folder for each year, with a mbox file for each month. There are 66909 messages in the mbox’s. The import shows 65895. Is the difference of 1014 due to the failures indicated in the import output?
The posts were converted from Eudora mbx’s to standard mbox’s with Aid4Mail.
For the “Parent message doesn’t exist” error I see 421 instances.
For the “Failed to map post” error I see 149 instances.
My split_regex is “^From .@. [0-9]{4}” which seems suitable for the headers such as,
I’ve seen the same error last week while working on a different problem. I’m going to fix it this week and will post an update here once it’s fixed.
Those are just warnings and probably appear due to the “Failed to map post” errors. It happens when a message references a post that doesn’t exist. I’m quite sure that fixing the other problem will fix most if not all of these warnings.
You could take a look at the index.db the import script creates. It’s an SQLite3 database. You could run the following query to see what the parser is working with. It selects the messages for the two Message-IDs you posted.
SELECT *
FROM email
WHERE msg_id IN ('bbe76bf7a9cab5a2ec2a06e6ef453555', '23a86e52-71ba-7435-1c9c-c4f2a134b90d@mmtaylor.net')
I guess the email_date and raw_message columns will be the most interesting columns for you. Maybe you can find what’s confusing the email parser…
For the first the date is null, and I see there is no date for that message in the mbx. I note that the reply (with :Re) appears before the “initial” message, which is why I thought the date was not missing. Does the import take the parent messages as the first one in the file with that subject?
Is the email date taken from the “Date:” line, such as?
Date: Wed, 25 Mar 1992 12:23:00 GMT
I’ll see if I can repair those with missing dates.
No, it uses the In-Reply-To and References header to match and sort by Message-ID unless you changed the importer’s group_messages_by_subject setting to true.
Yes.
My best guess is that there’s a problem with one of the attachments. Maybe the file extension isn’t allowed?
I did set group_messages_by_subject setting to true as without it there was no grouping at all.
That message has two inline images:
Content-Type: application/octet-stream;
name=“Conflict (was … long live Wil”
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename=“Conflict (was … long live Wil”
Yes, that works. I did the same in the past. I’d recommend setting index_only to true in settings.yml, so that it doesn’t start to import immediately after indexing the messages. You can make all the needed changes in the database after the indexing has finished. Then change index_only to false again and restart the import.
I think I am misunderstanding something there. Hasn’t the indexing already been done, as index.db is already built?
I have transferred index.db to my desktop. I was going to update the dates then transfer index.db back to the server and then run import again. Is that not right?
I decided to go the route of editing the mbox files, adding a “Date” line, e.g. “Date: Wed, 25 Mar 1992 17:43:06”. I transferred the updated files and reran the import, twice. However, the email_date field was not updated.