Import "Failed to map post" error

I have about 1000 posts failing on an import, with, I think, the main error being “Failed to map post”, such as

Failed to map post for b9ec0145-e587-c0e2-768d-ad482c3ab928@mmtaylor.net
undefined method `hex’ for nil:NilClass

this seems to cause many other posts to fail with error messages like,

1109 / 65895 ( 1.7%) [400 items/min] Parent message b9ec0145-e587-c0e2-768d-ad482c3ab928@mmtaylor.net doesn’t exist. Skipping CAKPLMstp+CaTyfFinM-dHHpVxNHt0fy2vXT9Fx+21mE2RT-ijg@mail.gmail.com: A PCT approach to the “Power Law”

Any suggestions on how to resolve this?

I have 28 years of posts, with a folder for each year, with a mbox file for each month. There are 66909 messages in the mbox’s. The import shows 65895. Is the difference of 1014 due to the failures indicated in the import output?

The posts were converted from Eudora mbx’s to standard mbox’s with Aid4Mail.

For the “Parent message doesn’t exist” error I see 421 instances.
For the “Failed to map post” error I see 149 instances.

My split_regex is “^From .@. [0-9]{4}” which seems suitable for the headers such as,

From mmt-xxx@somedomain.net Wed Aug 10 12:06:53 2016

3 Likes

Help! Any suggestions for this? :woozy_face:

I’ve seen the same error last week while working on a different problem. I’m going to fix it this week and will post an update here once it’s fixed.

Those are just warnings and probably appear due to the “Failed to map post” errors. It happens when a message references a post that doesn’t exist. I’m quite sure that fixing the other problem will fix most if not all of these warnings.

5 Likes

That error should be fixed by https://github.com/discourse/discourse/commit/e84d88ddea6674872ae7802a3aa619747b512c94

3 Likes

I upgraded, rebuilt import, checked that receiver.rb had been updated and re-ran import.

That seemed to resolve quite a few hundred messages, thanks.

I am still getting around 200 hundred failing, though, due to a couple of types of failures:

Date is missing. Skipping bbe76bf7a9cab5a2ec2a06e6ef453555

Failed to map post for 23a86e52-71ba-7435-1c9c-c4f2a134b90d@mmtaylor.net
Discourse::InvalidAccess

Then there are lots of “Parent doesn’t exist” messages, which I presume result from above.

Any idea what these errors are due to? On the first I’ve had a look at the mbx message and I don’t see a date missing.

You could take a look at the index.db the import script creates. It’s an SQLite3 database. You could run the following query to see what the parser is working with. It selects the messages for the two Message-IDs you posted.

SELECT *
FROM email
WHERE msg_id IN ('bbe76bf7a9cab5a2ec2a06e6ef453555', '23a86e52-71ba-7435-1c9c-c4f2a134b90d@mmtaylor.net')

I guess the email_date and raw_message columns will be the most interesting columns for you. Maybe you can find what’s confusing the email parser…

1 Like

For the first the date is null, and I see there is no date for that message in the mbx. I note that the reply (with :Re) appears before the “initial” message, which is why I thought the date was not missing. Does the import take the parent messages as the first one in the file with that subject?

Is the email date taken from the “Date:” line, such as?

Date: Wed, 25 Mar 1992 12:23:00 GMT

I’ll see if I can repair those with missing dates.

For the second, I can’t see anything obviously wrong. Does this image give any clue to the problem>

No, it uses the In-Reply-To and References header to match and sort by Message-ID unless you changed the importer’s group_messages_by_subject setting to true.

Yes.

My best guess is that there’s a problem with one of the attachments. Maybe the file extension isn’t allowed?

1 Like

I did set group_messages_by_subject setting to true as without it there was no grouping at all.

That message has two inline images:

Content-Type: application/octet-stream;
name=“Conflict (was … long live Wil”
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename=“Conflict (was … long live Wil”

Content-Type: image/jpeg; name=“2.1.3FarmerSideEffectLoop.jpg”
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename=“2.1.3FarmerSideEffectLoop.jpg”

Could it be that the first filename doesn’t have an extension?

1 Like

Could I resolve the date issue by inserting the date in index.db, rather than mucking about with the mbx file?

1 Like

Yes, that works. I did the same in the past. I’d recommend setting index_only to true in settings.yml, so that it doesn’t start to import immediately after indexing the messages. You can make all the needed changes in the database after the indexing has finished. Then change index_only to false again and restart the import.

I think I am misunderstanding something there. Hasn’t the indexing already been done, as index.db is already built?

I have transferred index.db to my desktop. I was going to update the dates then transfer index.db back to the server and then run import again. Is that not right?

I decided to go the route of editing the mbox files, adding a “Date” line, e.g. “Date: Wed, 25 Mar 1992 17:43:06”. I transferred the updated files and reran the import, twice. However, the email_date field was not updated.

Do I need to delete index.db?

Yes, you need to delete it.

1 Like