My file has 2805+ bad words, but its only allowed 2000 bad words, how can i add more words? If i want to add 10,000 bad words, from a text file, how to do it? as right now it allows me only to add max 2000 entries.
There are no plans to increase this limit at the moment. If this is a deal breaker you should look into writing, or commissioning, a plugin for it.
I see myself potentially running into this limit from using watched words to combat repetitive spam, and had some thoughts about what might be useful in the future to others, if not to the OP.
A way to deal with this without any code change is to change to Using Regex with Watched Words and combine many words into a single regex. It is easy to get wrong if you aren’t familiar with regular expressions, but it’s technically feasible. (This is the direction I am likely to go, because I know regular expressions.)
Additionally, I would expect that there are two ways to write a plugin here.
The reason for the 2000 limit is that the algorithm doesn’t scale very well and is run synchronously, but it’s an arbitrary limit. I would expect that a simple plugin could monkey-patch the 2000-word limit to accept the performance hit. But I wouldn’t do that for 10000 entries, myself!
The other, possibly complementary, approach would be to have a separate list specifically for flagging, and to do that asynchronously from a sidekiq job that is fired off for each post create/edit.
Like others, I’ve gone down this path:
- Start with a list, maybe downloaded from a current GitHub repo.
- Immediately hit the 2000 entry limit.
- Oh, I can use Regex - awesome!
- Complex Regexps easily go over 100 characters.
- Break those up.
- Refine a Regexp, oops it extended over 100 characters too.
- Break it up even further.
Dancing with limits isn’t prohibitive, it’s just annoying, especially when the limits are artificial. That said, I understand this filtering is synchronous and that extended processing can create performance issues, and I appreciate the difficulty of trying to establish limits that work for the largest possible audience. So while I struggle with the limits, I can’t reasonably disagree with them.
I see the code for filtering here in word_watcher.rb. As a developer I would be happy to take a shot at writing a plugin, but that code doesn’t look extensible. I would have no idea how to hook JavaScript in a plugin to augment the Ruby processes … or if that is possible with how the word_watcher code is written.
Here is an idea for an enhancement to help relieve some of the burden of processing extensive lists.
Rather than processing the entire list for each type of watch list, consider a loop through different blocks of filters. For example, we can put the most common and abusive filters in block1 and others in blocks2-n. The synchronous filter process will only process one block at a time, and only do a complete loop through all filters on the final Save operation. Blocks can operate on existing lists, so there’s no need for anyone to make a change. Existing lists will be broken into 1000 entry blocks, so block1 is 1-1000, block2 is 1001-2000, etc. Admins who would like to optimize can now choose to move their higher priority filters further up in the list.
An advantage of this is that the entire list doesn’t need to be processed to catch an issue. The most likely issues will be caught with a smaller block and the synchronous process can return sooner from processing the smaller block. Sure, if watch-text isn’t found in the first block, another block will need to be processed. That’s slightly more overhead to catch less likely abuse. This becomes a matter of optional tuning - if anyone ever cares to do so.
An additional option here would be for the Admin to choose how large the blocks are. By reducing the size of blocks, maybe to 500 entries per loop cycle, each synchronous process will go faster, but there might be more blocks to process. This depends on the kind of abuse that is present and how well the list is prioritized. Again, this kind of tuning would be optional, and frankly I doubt many Admins would do too much tuning like this.
Note, fine-tuning implies that we have quantifiable metrics: How much time are we spending in watch-word processing and how many issues are we actually catching? This nerdy amount of detail should be left for a later enhancement or a plugin if it’s really desirable.
Ultimately if “watch-word block processing” is implemented as described here, the number of items in the list can be extended beyond 2000. Yes, there will be some overhead in reading longer lists and breaking them up. Once again, if we have metrics about how much time is consumed in this process, Admins can determine their own threshold for optimization … but I kinda doubt many people would get into this deeply. The published guideline can be something like “The limit remains 2000 watch words without watch word block processing. With block processing there is no specific limit, but the practical limit might be estimated to be about 5000, with remarkable performance degradation increasing as the number of entries increases.”
Any joy here?
At the end of the day if we do this on the server, we can do “infinite” size, split post into words, then a single query against a “block” table, which at worst is 1 table scan.
I think that if what you need is GIANT block lists, I would recommend building a custom plugin.
Of the 20+ code languages and dialects that I’ve learned, Ruby aint one of them. So a plugin from scratch is a challenge I don’t believe I could take on. I would gladly do this in another language … or wait until someone else took it on.
Thanks.