Auto-tag on string acts on *string*

This bug was discovered when I had an auto-tag specification on a short string, e.g. “art”. The result auto-tagged topics with “artificial”, etc.

FWIW: the “test” feature on the auto-tag spec page works fine (for auto-tag on “art”, “artificial” does not generate a tag in the test).

The bug has probably not been noticed because it is perhaps uncommon to auto-tag on a short watched word.

4 Likes

I think @codinghorror noticed this as well. It is on our list to sort out.

4 Likes

I fixed this bug and replace, link and tag watched words will act on whole words.

There is an exception when watched_words_regular_expressions is enabled.

4 Likes

Thanks for your response Bianca.

I guess I had watched_word_regular_expression enabled by default, and didn’t realize this breaks ‘acting on whole words’. Is it necessary that acting on whole words is incompatible with regular expressions??

I.e., should I still think of this as a bug, or a necessary constraint caused by another feature?

So far, I’m still thinking of it as a bug. I don’t see any reason why non-reg-ex parsing of full words should be incompatible with reg-ex parsing when a reg-ex is specified.

Hey Norman,

If you’re using regex for some of your watched words, then it applies to all. As such, if regex is enabled and you have auto-tag configured for art, artificial is expected to be tagged. To look only for the word art, use word the boundary metacharacter \b. In the case of art, that would like like \bart\b

3 Likes

We should make sure the UI tells us when this is enabled, since the meaning of the field is quite different when it is enabled.

Sort of like CAPS LOCK IS ON when entering your password, etc.

3 Likes

Thanks very much to all. I must apologize if my rather pedestrian understanding of regular expressions has been deficient and led to my misunderstanding of how they work for watched words. But… a couple of points:

  • I guess I thought the regex context was taken to be something like “string within word boundaries”. What else makes sense? Surely not the entire topic document? In this case for artificial to be tagged, I would need to specify art* (or art.* or some such, as referred to in the title of this topic).

  • Joshua: thanks for your word boundary metacharacter suggestion. Just tried it and it didn’t work. Neither in the Test function nor in actuality. So… currently there seems to be no work-around (or ‘correct way’ to get desired behavior).

  • The Test function is very nice. It seems to behave exactly as I intuitively thought it should. art triggers only when “art” as a word appears (and does not trigger on artificial), art* triggers on “artificial”, as expected. Furthermore, art* life triggers on both “art life” and “artificial life”. I also thought maybe the Test function might not be using regex parsing if I only enter a single word, but no… foo* art triggers on “foobar art”, does not trigger on “foobar artificial”. So… whoever wrote the Test function was thinking the way I’m thinking (I think).

Bottom line,

  • Jeff’s suggestion of a reminder that watched_words_regular_expressions is enabled is good.
  • Test function behavior should match actual behavior.
    • and FWIW, my preference is that actual behavior should match current Test function behavior.
  • If one needs more regex knowledge than suggested by the current test function, would be good to have examples somewhere.
  • If there is a work-around or ‘correct way’ (like “use \bart\b to get desired behavior”, I’m happy to use it.

Again, thanks for everyone’s attention to this rather minor issue for a great platform!

2 Likes

Can we make sure this is assigned @zogstrip?

4 Likes

I added a notice when watched words regular expressions site setting is enabled in this PR:

This is what it looks with regular expressions disabled and then enabled (see the notice and different input placeholder):

4 Likes

But Bianca,
My try with’\bart\b’ did not trigger on art (or artificial, as it shouldn’t).

This try was for auto tagging.

Is there a reason why we couldn’t use exactly the existing Test function to parse topics (to do the auto tagging )?

Hello Norman,

If you have watched words regular expressions site setting enabled, then you must use \bart\b, where \b represents the word boundary. If the site setting is disabled, then you do not have to use it as the word boundaries are automatically included.

I just tested this and it works fine for me, including the test modal:

I implemented that and it should work on the latest version.

3 Likes

Hi Bianca,
Thanks so much for looking into this.

  1. I was confused about enabling of watched words regular expressions. I thought it became set automatically if I used a * wildcard in my autotrigger spec. I see that is not the case, so no surprise that my \bart\b try failed.
  2. I will check out ‘latest version’ to get your implementation of the test function. For me, Test always worked, as it also does for you.

Thanks again!

2 Likes