Knowing whether you think this is a $50, $500, or $5000 job would help people know whether to consider whether it was worth the time to figure it out.
Are there 200 messages, 2,000, 200,000, 2,000,000?
Having a few sample rows of sample (or actual) data and knowing how many rows there are would be a good start to understanding the problem.
Does the content already exist in a spreadsheet? That’s not an ideal format for these data. If they exist in some other forum, it’s almost certainly best to start with that.
Does the message content exist in Markdown or some other format that would need to be converted? Did whoever generated this text think that starting a line with 4 spaces was a great way to start a paragraph, or do they expect those lines to be displayed as
verbatim text? Are there special characters in the text that will mess with the formatting? What do you expect to happen with messages that have bad data?
How can you tell from the spreadsheet whether it’s a reply? The same title, and then the next messages are in order?
Is there threading, or are all replies a reply to the original message (“topic” in Discourse parlance)?
What about time stamps for the messages?
Does “ASAP” mean Friday, next Friday, or the end of April?
You don’t care about user’s real names or passwords, right?
Do the usernames conform to Discourse’s requirements?
Are the email addresses all valid? If they aren’t what do you want to happen to the data connected to the illegal email addresses?