How to mass merge topics into parents for a Yahoo Groups import?

After importing many tens of thousands of old posts from Yahoo Groups into Discourse, I’m left with a lot of disconnected topics that should be the same thread. It seems that the intuitive solution for this would be to select or highlight a number of these and then drag them into the parent topic.

OR perhaps even better way would be to simply select all of the topics you want to merge and then allow the system to simply pull them all together into a single topic with the oldest dated message being the parent and primary topic name/subject with existing dates determining the order.

For example, my messages are mostly like this:

  • Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name
  • [mailing list name] Some Topic Name

So basically I should be able to select all of these and simply use the right hand side wrench tool and click “merge” to accomplish the goal.

Is this functionality missing or am I overlooking something?

Discourse allows you to move posts to new or existing topics. I’m not sure if this is the solution for your problem, but if it is, there is a guide for moving posts here: Move posts to a new or existing topic.

1 Like

Thank you for pointing that out. Unfortunately I was already aware of how this functions. The real problem is that what you’re demonstrating in that tutorial is how to work with posts WITHIN a topic.

Imagine for a moment that you have 100 of the SAME topic with the same or slightly different subject/title lines.

The problem is that the mbox / Yahoo Groups importation did not work precisely correctly. Perhaps it is due to improper IDs in the emails. Perhaps something else, but it means that for some of us we have thousands or tens of thousands of messages that are wrongly disconnected from one another. We need a way to merge these.

Unfortunately an automated merging is probably not optimal based upon subject lines. We probably need to eyeball it and select many manually and then tell the system to merge all of these into a common topic.

It seems that the best (really only one) place to do that is within the actual topical view of a category. Doing it via the method you present in the guide is incredibly burdensome and simply not feasible since you have to go into a topic that has only one post, then go through the process of selecting that post, and then isolating the parent post and then combining it. These becomes further complicated when there are tens or hundreds of posts that already have the same subject line.

So what we need is a “Select topics…” function at the category view that works similar to how the “Select posts…” function works within a topic view.

Does this make sense?

1 Like

It sounds to me like you are at a crossroads. You have to decide how important it is for you to have these old yahoo messages correctly and tidily contained within one topic.

If it is important, my suggestion would be to go back a step and fix the mbox files. And then import them in carefully prepared small batches.

The best way to do this is probably to import them to mozilla thunderbird and fiddle with them there. Each thunderbird folder is its own mbox file, so you could move all related messages into one folder and then open up that mbox file in a text editor and do a search and replace to correct the message id, which is used by the importer to determine which belong together in one topic.

Really the best solution is to go back in time and convince the programmers who created outlook to try a bit harder to follow prevailing email standards.

3 Likes

One thing that I wonder: @pfaffman are you aware of any way to do something like this programmatically at this point? I MIGHT be willing to use the shotgun approach where I just say “merge all topics with the same subject line + this prependment”.

What I mean by this is as per my first OP here above wherein I note how some of the “Some Topic Name” topics are prefixed with “[mailing list name]”. Basically I could just use a “nuclear” merge option to pull everything that is LIKELY related and then split them out when users have inadvertently used the same topic names / subjects…

Lol, yeah, not a bad suggestion @tobiaseigen, but there’s no way in creation I’m going to be able to find the time to take this arduous step with so many tens of thousands of messages. The mboxes alone are over 500 MB. I’m going to try my luck at just fixing things over time from within Discourse since I can have other mods also help this way. I’m hoping either we find some other suggestions as per my last question or we’ll work on implementing a plugin if there are no other options.

I’d recommend starting over and getting the import to do it right to begin with; what @tobiaseigen recommended sounds promising. It’s been a long while since I did a big mbox import and I did a bunch of work up front fixing stuff like you describe. If that’s not an option then you can try something on the rails side where you somehow collect all the posts you think belong together, then sort them by date and create a new topic (or choose the first one) and move all the posts to the new topic.

It’d take me a while to come up with actual code.

1 Like

Well there has been a lot of participation on the forum since the importation so at this point I don’t believe a “start over” is within the realm of possibility. I’d be a little too scared to do much in the way of deletion and mods have already started massaging the imported data as well as merging accounts, etc.

I’ll look into a Rails method myself at some point here and then share what I come up with unless someone has a suggestion before I get a chance to do it. We’ll just leave things “messy” over the holidays here if necessary and until we figure this little screwball out.

1 Like