Ignore BOM on CSV when sending bulk invitations

When creating a CSV in Excel, it by default includes (what I think is) a BOM marker at the top of the file, to ensure that it’s parsed as UTF-8. Unfortunately Discourse doesn’t handle this (and doesn’t read the file as UTF-8!) so the first email address always fails with an error like this:

(Note that the “j” is the first letter of the actual email address.)

I think Discourse should strip/ignore the BOM on CSVs – it’s probably pretty common that people would put these together in Excel, after all. :wink:

2 Likes

That might make sense @techapj I can confirm via hex editor, when Excel 2016 file is saved as CSV

a,b,c,d
e,f,g,h

there is the BOM marker by default:

UTF-8 EF BB BF

3 Likes

I went to review the W3C page to double check my understanding that the BOMs were only needed for UTF-16 and UTF-32 because UTF-8 did not have big / little endian.
With HTML5 things have changed a bit since I last visited the page.
Older browsers could use …BE and …LE HTTP headers but now the only way is by reading the BOM
tl;dr for UTF-8 yes strip the BOM

https://www.w3.org/International/questions/qa-byte-order-mark

Fixed via:

https://github.com/discourse/discourse/commit/e6e00253267008b17c760befbd852b5fdf998060

5 Likes