Parent message doesn't exist

On import of mbxs I am getting an error that the parent message doesn’t exist even though it does seem to be in the database, index.db.

Here’s the error:

Parent message 9205270657.AB03850@ben.dciem.dnd.ca doesn’t exist. Skipping 9206031720.AA22567@ben.dciem.dnd.ca: A CALL FOR HELP

and here’s the db entry:

Any suggestion why this is failing?

2 Likes

Maybe the sort order is wrong because your are grouping emails by subject? Might be worth investigating. Messages are only sorted by Subject and the order of emails within the mbox file.

https://github.com/discourse/discourse/blob/3b062f79fccd99ebbcf92951cd3e205d574a06d6/script/import_scripts/mbox/support/database.rb#L123-L126

Are you really sure you need to group emails by subject? Judging by your screenshot it looks like the emails have correct Message-ID as well as In-Reply-To and References headers.

3 Likes

Thanks. Looking at the email_order table they look like they are in the correct order:

msg_id rowid
9205270657.AB03850@ben.dciem.dnd.ca 874
9206031720.AA22567@ben.dciem.dnd.ca 875

Could there be something else that is failing to import these parent message?

When I did the first import it looked like there was no grouping at all. I think the problem being that the replies are to the mailing list rather than the originator. Also, some messages don’t have those fields at all, as the archive was put together manually over 28 years in rather haphazard with different versions of Eudora.

Maybe it fails to import the parent message? Was there an error? It’s hard to tell why it doesn’t find the message. I’m sorry, but I guess you’ll need to debug this yourself by modifying the Ruby code of the import script.

1 Like

No, I don’t believe there was an error.

Ok, though I’m not familiar with Ruby or the import script, but I may be able to have a go.

Can you let me know which scripts to look at, and where they are? Does “debug” mean add print statements or is there more sophisticated functionality?

https://github.com/discourse/discourse/tree/master/script/import_scripts/mbox

1 Like

Well, I added a couple of print debugs to map_first_post

def map_first_post(row)
  puts "Mapping parent #{row['msg_id']} #{row['subject'][0..40]}"
  mapped = map_post(row)
  mapped[:category] = category_id_from_imported_category_id(row['category'])
  mapped[:title] = row['subject'].strip[0...255]
  mapped
  puts "Mapped message #{row['msg_id']} #{row['subject'][0..40]}"
end  

def map_reply(row)
  parent = @lookup.topic_lookup_from_imported_post_id(row['in_reply_to'])
  if parent.blank?
    puts "Parent message #{row['in_reply_to']} doesn't exist. Skipping #{row['msg_id']}: #{row['subject'][0..40]}"
    return nil
  end

  mapped = map_post(row)
  mapped[:topic_id] = parent[:topic_id]
  mapped
end

These print ok and suggest the parent is mapped (imported?)

873 / 65936 ( 1.3%) [3895 items/min]
Mapping parent 9205270657.AB03850@ben.dciem.dnd.ca A CALL FOR HELP
Mapped message 9205270657.AB03850@ben.dciem.dnd.ca A CALL FOR HELP
874 / 65936 ( 1.3%) [3900 items/min]
Parent message 9205270657.AB03850@ben.dciem.dnd.ca doesn’t exist. Skipping 9206031720.AA22567@ben.dciem.dnd.ca: A CALL FOR HELP

So, don’t see why the parent is blank in map_reply. The only thing I note is that the numbers (873/874) are one less than rowid above.

But I don’t think I can go much further as I don’t know what @lookup.topic_lookup_from_imported_post_id is doing and it is very laborious to edit with vi and rerun the import, with each cycle taking around 30 minutes.

It’s in base.rb in the same directory. And it’s doing exactly what the name of the function suggests, it’s looking for the topic_id by finding the import_id (which I assume is the message ID in this case} in a topic custom field (or maybe a post custom field?).

That’s better than the ones that take a week. :wink: (Sometimes you can do stuff to have the import script import only the stuff you’re trying to debug; figuring out how to do so is left as an exercise to the reader.)

You can try looking at the database and see if the parent message is getting imported and if it has a import_id topic/post custom field.

1 Like

Looking where? Is topic_id the subject?

By ‘database’ do you mean index.db? By ‘imported’ do you mean entered in to the email table of index.db? Yes, it is there. But no column called ‘import_id’.

By database I mean the discourse database. The import id is in the topic_custom_field and post_custom_field tables.

Aha!

I imported the data explorer plugin and had a look at the discourse database. I found that the import_id for the parent message was present in the topic_custom_field and post_custom_field tables. Also, the message did exist.

But, it had been deleted. So, I guess, I was getting the error “parent message does not exist” because the import was looking in the discourse database rather than in index.db. It would’ve been good to get an error message saying the post had been deleted.

Anyway, I think this happened as, during an early test, I had deleted the first(small) batch of imported posts. I thought I had restored to before that point, but clearly not.

The good thing is that this is only applicable to my test server and I shouldn’t have the problem on the import on the live server.

Thanks for the pointers.

Why? By whom?

This isn’t something that happens in an import.

Sounds right. And deleting posts doesn’t delete them, but marks them as deleted.