Import HTML site to Discourse?

Hey gang.

Is there any way to import a “wget” scrape of an old forum? I would like to move it to Discourse, but I do not have the SQL file. Old forum is based on Xenforo.

This thread gives me hope

If you can parse out the html the answer would be yes.
I recommend using wget and looking through the html for the structure.
You might end up scaping the site with something like puppeteer..

HI there.

I already have a full scrape of the site, I am looking for a way to import the HTML files into discourse usable content. I know it’s a long shot, but worth at least asking about

So this is a site that you don’t control/own and you scraped? Why don’t you have access to the database?

It’s hard to say without knowing more about what the data look like. Is the site online somewhere?

Anything is possible. Do you want users imported too? Do you have email addresses? Are there user profile files?

How many topics and posts are there? Is there one topic per html file? One post per file?

For a frame of reference, I’d not consider such a job for less than $5000. And I it’s not likely to be pretty when it’s done.

1 Like

If you know … or are willing to learn … how to code in Ruby, and the html structure is elegant enough … it’s not a long shot.

It’s entirely doable.

1 Like

The mailing list importer is probably a good place to start. You could pull the data into sqlite and go from there.

1 Like