Auto-tag on string acts on string

nhpackard · June 16, 2021, 1:24am

This bug was discovered when I had an auto-tag specification on a short string, e.g. “art”. The result auto-tagged topics with “artificial”, etc.

FWIW: the “test” feature on the auto-tag spec page works fine (for auto-tag on “art”, “artificial” does not generate a tag in the test).

The bug has probably not been noticed because it is perhaps uncommon to auto-tag on a short watched word.

sam · June 16, 2021, 1:42am

I think @codinghorror noticed this as well. It is on our list to sort out.

nbianca · June 18, 2021, 4:48pm

I fixed this bug and replace, link and tag watched words will act on whole words.

There is an exception when watched_words_regular_expressions is enabled.

nhpackard · June 19, 2021, 4:13pm

Thanks for your response Bianca.

I guess I had watched_word_regular_expression enabled by default, and didn’t realize this breaks ‘acting on whole words’. Is it necessary that acting on whole words is incompatible with regular expressions??

I.e., should I still think of this as a bug, or a necessary constraint caused by another feature?

So far, I’m still thinking of it as a bug. I don’t see any reason why non-reg-ex parsing of full words should be incompatible with reg-ex parsing when a reg-ex is specified.

jomaxro · June 19, 2021, 6:01pm

Hey Norman,

If you’re using regex for some of your watched words, then it applies to all. As such, if regex is enabled and you have auto-tag configured for art, artificial is expected to be tagged. To look only for the word art, use word the boundary metacharacter \b. In the case of art, that would like like \bart\b

codinghorror · June 19, 2021, 7:46pm

We should make sure the UI tells us when this is enabled, since the meaning of the field is quite different when it is enabled.

Sort of like CAPS LOCK IS ON when entering your password, etc.

nhpackard · June 20, 2021, 4:10pm

Thanks very much to all. I must apologize if my rather pedestrian understanding of regular expressions has been deficient and led to my misunderstanding of how they work for watched words. But… a couple of points:

I guess I thought the regex context was taken to be something like “string within word boundaries”. What else makes sense? Surely not the entire topic document? In this case for artificial to be tagged, I would need to specify art* (or art.* or some such, as referred to in the title of this topic).
Joshua: thanks for your word boundary metacharacter suggestion. Just tried it and it didn’t work. Neither in the Test function nor in actuality. So… currently there seems to be no work-around (or ‘correct way’ to get desired behavior).
The Test function is very nice. It seems to behave exactly as I intuitively thought it should. art triggers only when “art” as a word appears (and does not trigger on artificial), art* triggers on “artificial”, as expected. Furthermore, art* life triggers on both “art life” and “artificial life”. I also thought maybe the Test function might not be using regex parsing if I only enter a single word, but no… foo* art triggers on “foobar art”, does not trigger on “foobar artificial”. So… whoever wrote the Test function was thinking the way I’m thinking (I think).

Bottom line,

Jeff’s suggestion of a reminder that watched_words_regular_expressions is enabled is good.
Test function behavior should match actual behavior.
- and FWIW, my preference is that actual behavior should match current Test function behavior.
If one needs more regex knowledge than suggested by the current test function, would be good to have examples somewhere.
If there is a work-around or ‘correct way’ (like “use \bart\b to get desired behavior”, I’m happy to use it.

Again, thanks for everyone’s attention to this rather minor issue for a great platform!

codinghorror · June 22, 2021, 4:16am

Can we make sure this is assigned @zogstrip?

nbianca · June 23, 2021, 11:23am

I added a notice when watched words regular expressions site setting is enabled in this PR:

This is what it looks with regular expressions disabled and then enabled (see the notice and different input placeholder):

nhpackard · June 25, 2021, 1:06am

But Bianca,
My try with’\bart\b’ did not trigger on art (or artificial, as it shouldn’t).

This try was for auto tagging.

Is there a reason why we couldn’t use exactly the existing Test function to parse topics (to do the auto tagging )?

nbianca · June 25, 2021, 3:43pm

Hello Norman,

If you have watched words regular expressions site setting enabled, then you must use \bart\b, where \b represents the word boundary. If the site setting is disabled, then you do not have to use it as the word boundaries are automatically included.

I just tested this and it works fine for me, including the test modal:

I implemented that and it should work on the latest version.

nhpackard · June 27, 2021, 9:54am

Hi Bianca,
Thanks so much for looking into this.

I was confused about enabling of watched words regular expressions. I thought it became set automatically if I used a * wildcard in my autotrigger spec. I see that is not the case, so no surprise that my \bart\b try failed.
I will check out ‘latest version’ to get your implementation of the test function. For me, Test always worked, as it also does for you.

Thanks again!

Topic		Replies	Views
Can't seem to get Auto Tagging to work Bug watched-words	20	1473	May 26, 2021
Invalid regular expressions in 'Watched Words' makes no watched word work Bug watched-words	2	597	May 31, 2021
Watched Words for Tags not functioning properly Bug watched-words	7	508	December 1, 2021
Watched word regular expression crash Bug watched-words	7	819	June 28, 2024
* wildcards in Watched Words (Censor) don't work Feature	20	3068	January 11, 2018

Auto-tag on string acts on *string*

Related topics

Auto-tag on string acts on string