Redo Bad Import?

I screwed up a mbox import. What is the best way to delete the category and restart the import from scratch?

There are 6500 topics in the category I need to re-import. Deleting 30 topics at a time is a bit tedious. Is there a better way to redo the import?

Best is to delete the whole database and start from scratch. Next best would be to restore a backup done before you did the import, I guess.

2 Likes

Is there anything else I have to delete besides the categories before doing the re-import? Perhaps this file?

/var/discourse/shared/standalone/import/data/index.db

I have some other Categories I’d rather not delete. I did work thru deleting all the Topics and deleted the Category. But when I reran the import it created the category but with no posts. Clearly there more to this than I know.

Right. You need to delete the custom post fields that store the imported post ids. If you want to import everything again you could do

PostCustomField.where(name: "import_id").destroy_all

I think. You should make a backup first.

But you really want to revert back to the database before you imported the bad data.

The data is not bad. The bad import was my fault. I have two lists and I put one mbox in the wrong directory which imported both lists into the same category. :poop:

So two questions. Where/how do I run the PostCustomField command? Also, if I do choose to delete the entire db, how do I do that.

You’d run the rails command after you do this

cd /var/discourse 
./launcher enter app 
 rails c

How to delete the database depends on how you created it.

Thank you for the rails command.

I did a standard Discourse install.

Thanks for walking me thru this. I figured out the ruby console with help from Google U. I’ve never done a thing with Ruby so this is all new to me (including Docker). That has given me pause to use Discourse. But I will persevere.

I did this to install Discourse:

https://github.com/discourse/discourse/blob/master/docs/INSTALL-cloud.md

I found this which may or not be the correct way to delete the db in my case.

Your Ruby command you seems to be working. Topics and posts are loading. :slight_smile:

Here are the steps I did to redo the imported categories.


I could delete up to 150 topics at a time with a few appropriate key strokes.

For an imported category additional work is required:

cd /var/discourse/
./launcher enter app
ruby c
PostCustomField.where(name: "import_id").destroy_all
q
quit
exit
rm /var/discourse/shared/standalone/import/data/index.db

The commands on line 3 and 4 can take a while to execute so be patient.

Now you can rerun the mbox import.

cd /var/discourse
./launcher stop app
./launcher start import
./launcher enter import
import_mbox.sh
exit
./launcher stop import
./launcher start app

For details on importing an mbox:

Thank you @pfaffman for all your help with this.

3 Likes

To delete the database

cd /var/discourse 
./launcher stop import 
rm -r shared/standalone/postgres_data
./launcher rebuild import 

Or something very much like that.

1 Like

cd /var/discourse/
./launcher enter app
ruby c
PostCustomField.where(name: “import_id”).destroy_all
q
quit
exit
rm /var/discourse/shared/standalone/import/data/index.db

Should the categories be deleted after this? I can still see mine.

Btw, should it be “rails c” rather than “ruby c”?

No. That will only allow posts that have been imported before to be able to be imported again (given that duplicate titles are allowed). You almost certainly don’t want to do that.

Yes. But what you probably want to do is delete the database? Though since you say only what you did and not what you’re trying to do, it’s hard to say.

Well, in this moment I wanted to purge and re-import posts as I am testing out different import settings and different categories.

This code did the job, allowing me to re-import the same posts.

cd /var/discourse/
./launcher enter app
rails c
PostCustomField.where(name: “import_id”).destroy_all
quit
exit
rm /var/discourse/shared/standalone/import/data/index.db

The “q” line seemed to be erroneous.
First, though I removed the topics and categories manually.

The overall project is to import 28 years of posts from a mailing list, which is held in an Eudora archive. The import uncovers some things which I may want to change with some pre-processing, such as empty subjects and generating an annual category rather than all in one, which would end up with a very large single page for them.

Wouldn’t that delete existing posts? Although I am currently running the import on a test server,eventually I want to run the import on a live, active server.

Yes.

So you want to just start with a fresh database.

That’s always scary, but sometimes unavoidable. You’ll want to take a backup immediately before the import, then run the import and then check, and restore the backup if something went wrong. Trying to delete the posts and associated records is fairly messy, but something like

Topic.where(category_id: 123).destroy_all

should do it they (and nothing else) are all in the same category.

For testing and re-testing an import script, I usually either wipe the database, or save a backup immediately after a wipe (and adding an admin account).

1 Like

I tried this, with the ID I got from site.json, but it didn’t delete any if the topics.

Here’s what I did. Am I doing something wrong?

sudo ~/var/discourse/discourse_docker/launcher enter app
rails c
Topic.where(category_id: 9).destroy_all
Topic.where(category_id: 10).destroy_all
quit
exit