Experience importing users and topics from CSV files?

I have a few (82) users and (133) posts I’d like to bring over from an abortive attempt on a former platform. Tiny, but just enough to make copy & paste onerous. I’m tentatively hoping to try the CSV importer script. It would be my first time working with a Ruby script.

I’ve prepared CSV files per discourse/script/import_scripts/csv_importer.rb at main · discourse/discourse · GitHub

But unless I’m missing something, the CSV import script doesn’t seem complete. I see no provision in the CSV requirements or the script to associate posts with parent topics. Not sure what this would end up looking like.

I’d also like the original post dates preserved, but there’s no field here for that.

The Zendesk importer does deal with “topic_id” and “created_at” fields. I don’t know if the Zendesk importer would be a better bet, or if it would bomb out on fields my CSV data lacks. (I’m not coming from Zendesk.)

Wondering if anyone has experience with the CSV importer as-is, or perhaps has modified it to add functionality.

I used the CSV importer before it was incorporated into core (and sponsored its development by @pfaffman).

It seemed to preserve posts in topics just fine, as well as the original post dates. Why not give it a go on a staging instance (or simply after a backup) and see what happens?

If there are issues, it would be good to get the script fixed up - but I suspect it will do the job for you just fine.

1 Like

Finally coming back to this after tabling it a while. I’m willing to run a backup and try something if I can grok a chance of success, but I crave a bit more confidence here. I lack much scripting experience but I’d really like to understand how the the csv importer would preserve posts (replies) and dates, as @nathank suggests, since the script doesn’t seem to define any handling of them.

It imports limited fields for: users, emails, custom user fields, categories, and topics.

I don’t need custom user fields or new categories, so the relevant CSVs and their specified fields are:

 == CSV files format
File name: users
headers: id,username

File name: emails
headers: user_id,email

File name: topics_new_users
headers: id,user_id,title,category_id,raw

File name: topics_existing_users
headers: id,user_id,title,category_id,raw

From a squint at this data model, Discourse Topics and Posts are two different creatures with some differentiating fields:

Screenshot 2024-07-21 151320

I don’t see anything in the script to handle Posts — or dates.

Maybe I’m supposed to lump incoming Topic and Post data together, but if so, how would Discourse infer the topic/reply relationship – is it just the sequence of the input? Are replies related to a Topic having the first appearance of a shared ID? All it says about ids is:
except for the topics_existing_users, the IDs in the data can be anything as long as they are consistent among the files.

If the script isn’t missing something, then I must be. I appreciate any clarifying thoughts!