I’m importing an old and large forum about unicycling.
The old categories weren’t the best, and a lot of different stuff was mixed together.
So, I’m re-organizing categories.
At first, I was thinking to manually re-categorize the most recent few hundred of topics, and keep the old ones as they are.
The idea would be to aim at the future, not at the past. Doesn’t matter that much if old topics are badly categorized, the most important is that they are still available.
But I’m wondering if re-categorize topics automatically by targeting keywords could do, in fact, a good job.
Currently, the vast majority of our topics -more than half of the total!- are in a single category ().
I could target these keywords in the titles: “learn”, “learning”, “train”, “training”, “posture”, etc… And put all these topics in a category #riding-advice.
The same could go with “frame”, “wheel”, “tire”, “saddle”, etc… That would go in #unicycles-and-equipments.
I’ll target words wrapped by spaces and try to anticipate multiple words expressions and prevent a bit of “false positives”. Example: “wheelwalking” is a unicycle trick that should probably be found in #riding-advice, so if I target only “wheel” without thinking much, there will be false positives that could have been easily avoided (that said, I could move topics with “wheel” from A to B, and then move topics with “wheelwalking” from B to C…).
Did some people here do such a thing? Do you have suggestions or ideas to minimize the risk of “false positives”? Are there obvious (or not) things that I need to know before doing this?
About 70000 topics must be looked at.