I’m sorry to report this, but the Xenforu importer doesn’t support attachments or Gallery images. You’ll need to modify the importer yourself to add support or hire a developer. If you are not afraid of a little Ruby (I wrote a couple importers before I really knew Ruby), you might check out the vanilla_mysql, vbulletin, or answerhub importers, which replace a [attach] tag with an attachment. If you’re a programmer, once you’ve done the [attach] it should be reasonably clear how you’d similarly do the [gallery] tags.
If you’d like to hire someone to do the work, you can post in marketplace and/or have a look at Redirecting…
Thanks, Jay – I’ll have a look at the importers you mentioned and see how I can extend the XF importer.
On a slight tangent, it seems strange to have each importer as a completely standalone script, even though each one is broadly doing the same thing. Has anyone considered consolidating them into a single importer that reads configuration files for each specific platform? Eg: instead of writing/copying new code to handle XF attachments, one could extend the XF config file with the regex patterns for XF attachments, and the import script would automatically pick them up.
In principle it would make it much easier to support additional platforms.
What started this conversation is that at least two people who are importing from exactly the same forum software needed something completely different. Presumably a bunch of people before you have used the script and it’s all but worthless to you. Having done dozens of imports, it’s striking how each one is a unique snowflake… As a rule, importers are developed by someone whose job it is to solve one problem, the current import, and not imagined problems of future imports. It’s hard enough to clean things up enough to submit a PR.
That said, the phpbb importer does a beautiful job abstracting pulling data from the database and pushing it in to Discourse. It’s beautiful code, but it counts on the programmer really understanding ruby. Whenever I start a new importer, I want to base it on that importer, but end up using one of the others as a starting point, as it’ll take me an extra day to finish understanding how everything magically fits together.
Ok, so it looks like the vbulletin and vannila-mysql importers just strip the attachments out, but the answerbase importer actually supports them. As far as I can tell, it captures the file path from the post body, creates a new discourse upload and replaces the tag in the imported post with the html tags to reference the new upload.
This gives us a good starting point for XF uploads, but we have two additional hurdles:
XF’s media tags don’t include the file’s path they are basically [ATTACH]{{id}}[/ATTACH] or [GALLERY=media, {{id}}]{{title|kebab-case}} by {{username}} posted {{date}}[/GALLERY]. This means that we must first map the ids to filenames before we can create Discourse uploads.
The XF uploads are renamed to {{id}}-{{hash}}.{{ext}}. Under the hood XF uses a database table to track the mappings between the original filename and the hash.
I can think of a few different approaches:
use regex to rewrite the media files to links of the form {{original XF domain}}/media/{{id}}/full and enabling the setting in Discourse which makes it download external images #hacky
query the source database for the filename & rename the file before creating the Discourse upload
import the files as they are, and lose the original filenames & get meaningless alt tags
In principle, we could also supply urls instead of local filenames when creating the uploads, but I haven’t checked if it would know to download the files in this case. My guess is it wouldn’t.
Personally, I’m leaning towards option 2, but I’m keen to get input from anyone with more Discourse experience, as I’d like to contribute any improvements back to the repo.
I don’t really need something ‘completely different’ just a bit richer. The script as it is works well and provides a good starting point – it’s certainly not worthless. I just figured tweaking a config file would make more sense than modifying the script.
I generally use method two. Method one has some appeal, and I often consider it but I think I’ve always done it the other way.
Well, “worthless” may be hyperbole, but only if you’re a programmer.
And often issues can’t become apparent until you’ve run the importer against the data set. In your case, it is obvious that it doesn’t support attachments, but the media tags would likely still be a surprise.