Maybe the sort order is wrong because your are grouping emails by subject? Might be worth investigating. Messages are only sorted by Subject and the order of emails within the mbox file.
Are you really sure you need to group emails by subject? Judging by your screenshot it looks like the emails have correct Message-ID as well as In-Reply-To and References headers.
Thanks. Looking at the email_order table they look like they are in the correct order:
msg_id
rowid
9205270657.AB03850@ben.dciem.dnd.ca
874
9206031720.AA22567@ben.dciem.dnd.ca
875
Could there be something else that is failing to import these parent message?
When I did the first import it looked like there was no grouping at all. I think the problem being that the replies are to the mailing list rather than the originator. Also, some messages don’t have those fields at all, as the archive was put together manually over 28 years in rather haphazard with different versions of Eudora.
Maybe it fails to import the parent message? Was there an error? It’s hard to tell why it doesn’t find the message. I’m sorry, but I guess you’ll need to debug this yourself by modifying the Ruby code of the import script.
These print ok and suggest the parent is mapped (imported?)
873 / 65936 ( 1.3%) [3895 items/min]
Mapping parent 9205270657.AB03850@ben.dciem.dnd.ca A CALL FOR HELP
Mapped message 9205270657.AB03850@ben.dciem.dnd.ca A CALL FOR HELP
874 / 65936 ( 1.3%) [3900 items/min]
Parent message 9205270657.AB03850@ben.dciem.dnd.ca doesn’t exist. Skipping 9206031720.AA22567@ben.dciem.dnd.ca: A CALL FOR HELP
So, don’t see why the parent is blank in map_reply. The only thing I note is that the numbers (873/874) are one less than rowid above.
But I don’t think I can go much further as I don’t know what @lookup.topic_lookup_from_imported_post_id is doing and it is very laborious to edit with vi and rerun the import, with each cycle taking around 30 minutes.
It’s in base.rb in the same directory. And it’s doing exactly what the name of the function suggests, it’s looking for the topic_id by finding the import_id (which I assume is the message ID in this case} in a topic custom field (or maybe a post custom field?).
That’s better than the ones that take a week. (Sometimes you can do stuff to have the import script import only the stuff you’re trying to debug; figuring out how to do so is left as an exercise to the reader.)
You can try looking at the database and see if the parent message is getting imported and if it has a import_id topic/post custom field.
By ‘database’ do you mean index.db? By ‘imported’ do you mean entered in to the email table of index.db? Yes, it is there. But no column called ‘import_id’.
I imported the data explorer plugin and had a look at the discourse database. I found that the import_id for the parent message was present in the topic_custom_field and post_custom_field tables. Also, the message did exist.
But, it had been deleted. So, I guess, I was getting the error “parent message does not exist” because the import was looking in the discourse database rather than in index.db. It would’ve been good to get an error message saying the post had been deleted.
Anyway, I think this happened as, during an early test, I had deleted the first(small) batch of imported posts. I thought I had restored to before that point, but clearly not.
The good thing is that this is only applicable to my test server and I shouldn’t have the problem on the import on the live server.