Many thanks @gerhard, your patch is working like a dream. For my purposes, I think skipping the bad messages is okay since there are only a small amount, however we do now have additional output if it’s helpful to solve the issue or to make the importer script more robust:
Failed to index message in /shared/import/data/lammps-users/chunk_10.mbox at lines 726814-729353
execution expired
["/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5.rb:243:in `escape_text'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5.rb:214:in `serialize_node_internal'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5/node.rb:58:in `write_to'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node.rb:699:in `serialize'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node.rb:855:in `to_format'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node.rb:711:in `to_html'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5/node.rb:28:in `block in inner_html'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:238:in `block in each'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:237:in `upto'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:237:in `each'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5/node.rb:28:in `map'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogumbo-2.0.2/lib/nokogumbo/html5/node.rb:28:in `inner_html'",
"/var/www/discourse/lib/html_to_markdown.rb:74:in `block (2 levels) in hoist_line_breaks!'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:238:in `block in each'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:237:in `upto'",
"/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/node_set.rb:237:in `each'",
"/var/www/discourse/lib/html_to_markdown.rb:57:in `block in hoist_line_breaks!'",
"/var/www/discourse/lib/html_to_markdown.rb:54:in `loop'",
"/var/www/discourse/lib/html_to_markdown.rb:54:in `hoist_line_breaks!'",
"/var/www/discourse/lib/html_to_markdown.rb:16:in `initialize'",
"/var/www/discourse/lib/email/receiver.rb:387:in `new'",
"/var/www/discourse/lib/email/receiver.rb:387:in `select_body'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:74:in `block (2 levels) in index_emails'",
"/usr/local/lib/ruby/2.6.0/timeout.rb:108:in `timeout'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:70:in `block in index_emails'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:139:in `block (2 levels) in all_messages'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:171:in `block in each_mail'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:190:in `block in each_line'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:189:in `each_line'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:189:in `each_line'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:166:in `each_mail'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:132:in `block in all_messages'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:125:in `foreach'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:125:in `all_messages'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:66:in `index_emails'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:25:in `block in execute'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:22:in `each'",
"/var/www/discourse/script/import_scripts/mbox/support/indexer.rb:22:in `execute'",
"/var/www/discourse/script/import_scripts/mbox/importer.rb:43:in `index_messages'",
"/var/www/discourse/script/import_scripts/mbox/importer.rb:27:in `execute'",
"/var/www/discourse/script/import_scripts/base.rb:47:in `perform'",
"script/import_scripts/mbox.rb:12:in `<module:Mbox>'",
"script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'",
"script/import_scripts/mbox.rb:9:in `<main>'"]
As previously, I can share the specific message if it’s helpful – this time the error message gives me the specific line numbers so we can at least have high confidence that we’ve identified the correct message.