Are very long words not allowed in topic titles?

email

(Michael Downey) #1

We had a user get an email-in topic rejected with the topic:

Adding DropMillisecondsHibernateInterceptor for MySQL 5.6+ compatibility

With the error message: Title is invalid; try to be a little more descriptive.

However, when he changed the title to:

Adding an interceptor to drop milliseconds for compatibility with MySQL 5.6 and above

The topic was accepted & processed.

  1. Does + really cause an email to be rejected? Must it be?
  2. If it must, can we at least get a more accurate error message?

"Title is invalid; try to be a little more descriptive" when the title has really long words in CamelCase
(Régis Hanol) #2

I’m pretty sure that the issue is “DropMillisecondsHibernateInterceptor” (a very long word) and not the “+”.


(Michael Downey) #3

Should disabling the title prettify site setting prevent long words from triggering an error?


(Jeff Atwood) #4

It might, we detect long unbroken sequences of characters as griefing in titles.


(Michael Downey) #5

Nevertheless, could the error message be changed to be more specific and better represent what users need to do to post their message? In this case, making it more descriptive would not help.


(Fenec) #6

I think refactoring TextSentinel to make it show meaningful error messages for all kinds of ‘invalid’ titles and posts would be great.


(Michael Downey) #7

Yes, absolutely. Right now the ambiguous error messages just confuse and frustrate users.


(Mittineague) #8

I just test by changing the default
max topic title length
from 255 to 25.

I must be missing something, this error message doesn’t seem at all ambiguous to me.


(Stephanie) #9

I think the problem is long words, not long titles.


(Mittineague) #10

Thanks, now I see.

I tried this for a Title
thisisonesuperexceptionallylongwordbeingusedasatitletoseeifitthrowsanerrorandifsowhatthaterrormessageis

And got this

Ironic, isn’t it?


(Mittineague) #11

It seems this does indeed involve text_sentinel.rb
It tests different things that a FAIL in any result in the same “invalid” message.

  def self.body_sentinel(text, opts={})
    entropy = SiteSetting.body_min_entropy
    if opts[:private_message]
      scale_entropy = SiteSetting.min_private_message_post_length.to_f / SiteSetting.min_post_length.to_f
      entropy = (entropy * scale_entropy).to_i
      entropy = (SiteSetting.min_private_message_post_length.to_f * ENTROPY_SCALE).to_i if entropy > SiteSetting.min_private_message_post_length
    else
      entropy = (SiteSetting.min_post_length.to_f * ENTROPY_SCALE).to_i if entropy > SiteSetting.min_post_length
    end
    TextSentinel.new(text, min_entropy: entropy)
  end

  def self.title_sentinel(text)
    entropy = if SiteSetting.min_topic_title_length > SiteSetting.title_min_entropy
      SiteSetting.title_min_entropy
    else
      (SiteSetting.min_topic_title_length.to_f * ENTROPY_SCALE).to_i
    end
    TextSentinel.new(text, min_entropy: entropy, max_word_length: SiteSetting.title_max_word_length)
  end

  # Entropy is a number of how many unique characters the string needs.
  # Non-ASCII characters are weighted heavier since they contain more "information"
  def entropy
    chars = @text.to_s.strip.split('')
    @entropy ||= chars.pack('M*'*chars.size).gsub("\n",'').split('=').uniq.size
  end

  def valid?
    @text.present? &&
    seems_meaningful? &&
    seems_pronounceable? &&
    seems_unpretentious? &&
    seems_quiet?
  end

  private

  def symbols_regex
    /[\ -\/\[-\`\:-\@\{-\~]/m
  end

  def seems_meaningful?
    # Minimum entropy if entropy check required
    @opts[:min_entropy].blank? || (entropy >= @opts[:min_entropy])
  end

  def seems_pronounceable?
    # At least some non-symbol characters
    # (We don't have a comprehensive list of symbols, but this will eliminate some noise)
    @text.gsub(symbols_regex, '').size > 0
  end

  def seems_unpretentious?
    # Don't allow super long words if there is a word length maximum
    @opts[:max_word_length].blank? || @text.split(/\s|\/|-|\./).map(&:size).max <= @opts[:max_word_length]
  end


  def seems_quiet?
    # We don't allow all upper case content in english
    not((@text =~ /[A-Z]+/) && !(@text =~ /[^[:ascii:]]/) && (@text == @text.upcase))
  end

end

It would require code changes to have more specific error messages.

Maybe easier to just have the message point out the possible FAIL reasons and let users figure out which one was the problem?

Though I guess a better error message alone wouldn’t solve the OP problem if such was a necessity to use in the email.


(Fenec) #12

I made a fix:
https://github.com/fenec/discourse/commit/849da848c14c5bc913609d45fcfcb0d90895fe99
Would be great to have some better wordings for errors.


(Jeff Atwood) #13

That is not really the intent of this feature. We view it as anti-griefing, and telling griefers how to bypass filters is like Google telling spammers exactly what they didn’t like about their emails.


CAPITOL in leading word in title is humanizes incorrectly
(Michael Downey) #14

The only grief happening for us is on behalf of our legitimate users who can’t figure out how to successfully post topics. :frowning:


(Mittineague) #15

Have you tried tweaking your
Admin -> Settings -> Posting
settings?


(Michael Downey) #16

Yes, for other issues. But AFAICT none of those settings relate to the issue raised in this bug.


Don't allow super long words if there is a word length maximum
(Kane York) #17

What about this one?


(Jeff Atwood) #18