Don't allow super long words if there is a word length maximum

helperhaps · May 19, 2016, 9:54am

I got an invalid title alert when i create a topic. DIg into the code i found it raised by TextSentinel class. And I have already read this https://meta.discourse.org/t/are-very-long-words-not-allowed-in-topic-titles/29683/16. There is a related method in the class.

  def seems_unpretentious?
    # Don't allow super long words if there is a word length maximum
    @opts[:max_word_length].blank? || @text.split(/\s|\/|-|\./).map(&:size).max <= @opts[:max_word_length]
  end

I understand what it means. But for other language such as Chinese or Japanese whose word is not splited by blank or something about the patern.

Besides SiteSetting.title_max_word_length’s value is 0 when i leave it blank in the setting panel, so the expression @opts[:max_word_length].blank? is always true. it is meaningless.

My users use Chinese more, So I have to set SiteSetting.title_max_word_length as same value as SiteSetting.title_max_topic_title_length to make it work.

Is there some other way to solve it?

sam · May 24, 2016, 3:08am

@fantasticfears any idea what to do here?

fantasticfears · May 24, 2016, 5:44am

For CJK, it is meaningless. Word segmentation algorithm would happily chops a sentence into characters and words whenever it can. Without understanding the sentences, I’m afraid it’s not easy to identify good/bad word.

3 ways to disable it:

Put some locale check
Put a comment about how to disable it by setting the value to the same as topic length.
Add another setting to disable it.

The latter two is much better. Although locale-based site settings should be introduced at some moment for convenience.

BTW, if my mentor @tgxworld agrees, I could ask some pointers about how this can be done at some moment in June:

helperhaps · May 27, 2016, 8:11am

How about just enabling it when all the characters in title is consisted of ASCII code？

fantasticfears · May 27, 2016, 9:18am

I think would have to wait until introducing some Unicode filtering libraries instead of regex. I could take this for 1.6 and Unicode username.

It doesn’t make much sense for checking the ascii part within the Chinese sentences. The only possible use case is a multilingual forum which might need such checks based on the title sequences (still, shouldn’t look into ascii sequences in the Chinese sentences for example)

Topic		Replies	Views
Minimum title lengths on international sites UX	21	1128	October 24, 2024
Are very long words not allowed in topic titles? Bug email	17	3464	January 1, 2016
居然知道我用的是中文，为什么要这么长的标题 Support	6	1686	January 25, 2019
How to turn off a checker for "Title seems unclear, is it a complete sentence?" Support	19	9324	November 23, 2020
Large value for "title max word length" causes server error Bug pr-welcome	5	121	August 8, 2024

Don't allow super long words if there is a word length maximum

Related topics