Great to seeing this addressed! We run an international forum and while English is the main language, we have categories dedicated to other languages and this has been a long term annoyance.
Now that skipped_locale
is only used for seems_unpretentious
, I’m wondering if we may skip ‘ko’ since modern Korean uses spaces? Mind that I don’t speak Korean, so you may want to double check on this.
While I have your attention there’s one more thing that I think could be an easy improvement on TextSentinel
but didn’t dare touching (again, not a Ruby developer). If you have a moment, I think it’s fairly simple and could get a free performance gain.
As I understand, this checks if a word is longer than the limit by splitting text into words, calculates the length of each one, scans all lengths to the find the highest, and only then compare that with the limit.
Could we perhaps skip all that by just trying to match the text against something like /\p{Alnum}{#{max_word_length + 1},}/
(syntax likely wrong, but hopefully you get the idea)?
Without knowing the inner-workings of Ruby, this is more likely to stop the check as soon as there’s a match, and if there’s no too long word (most common case), the text is only scanned once, skipping the splitting, the individual word length check, etc.
Sorry if I’m hijacking the topic here, but as the new PR is already merged, I’m not sure the best place to post this as it’s perhaps too small to deserve a new topic, but seems like an easy win. Feel free to run with it.