How to log bad data in an import script


(Jay Pfaffman) #1

I’ve made considerable improvements to mbox.rb, the mbox email importer. I’m importing 230K messages, many of which have had the email addresses broken in one way or another. The script used to just crap out if something went wrong; I’ve added code to let the user know what file is being processed and print error messages including the bad address and the message-id when email addresses are bad and the message ID and who message when something else goes wrong.

print_status does a lovely job of, well, printing the status, but it’s less pretty when I put error messages. Also, when there are hundreds of lines of lines of such messages, having them in a terminal can be a bit unwieldy (especially if you switch sessions in tmux and lose the scroll-back buffer). Maybe what I should do is send stuff to stderr? Or should I just not worry?


(Matt Palmer) #2

Yes, problems and errors should be sent to stderr, rather than stdout. The user can then capture that output to a file for later analysis, with 2>/tmp/import_errors.log, without interrupting the progress and status information.


(Jay Pfaffman) #3

Is there a way we like? Otherwise, I’ll just Google “ruby output stderr” and see what comes up. :slight_smile:


(Matt Palmer) #4
$stderr.puts "I'm in a stderr!"

Probably worth wrapping it up in some sort of print_error method, for clarity and overridability.