Linkify words in post

Hey, this’s a great component! Thanks for having done this.

But a little problem there, it seems that the linkification requires a space after the specific word, so the component is not working well in Chinese and Japanese (maybe also Korean? I don’t know this language) Discourse instances.

I’m a native speaker of Chinese, and I speak Japanese too. In Chinese, it happens that people use some fixed borrowed words in letters, like Apple, Office, BB (Blackboard), DNA, etc., whose translation is quite rarely used in the familiar language. And we usually don’t add a space before/after this kind of word when it’s surrounded by Chinese, in informal writing.

This is an example:

你会用Office吗?
Do you know how to use Office suite?

It’s similar in Japanese.

革新に満ちたAppleの世界へようこそ。(copied from Apple Japan official site)
Welcome to the world of Apple, which is full of revolutions.

I’m aware that it might be a lot of work to adapt this component for these two languages, cause this might lead to unintended linkifications in languages using the alphabet. So, just pointing out the existence of a small imperfection. Appreciate again this great idea of making this component. :smiley_cat:

2 Likes

This works great. Is it possible to run it retroactively, or do you need to search all the links manually and change them?

You should be able to rebake those posts. You can click the wrench and rebuild html. When you s they works then you can search for how to rebake matching posts rake task.

Actually, this theme linkifies posts dynamically when they are loaded in the browser so no need to do anything. :slight_smile:

3 Likes

Oh. Very cool. I hadn’t realized, but that makes sense.

3 Likes

Our FOSS genealogy software glossary was too extensive for a Posting or Linkifying.

If we had just Linkified each Glossary term, then every posting would drown in a sea of blue links. Users would be likely to stop clicking links. And we wanted an audit trail for the Glossary.

So I linkified “Gramps Glossary” to that article in our MediaWiki-powered wiki. And now use an annotation like “(see [glossary term] in the Gramps Glossary)

(We have a 2nd Genealogy Glossary for terminology that is generic to the Genealogy subject matter rather than specific to our software.)

1 Like

Has something been done to allow pipes?

I’m currently using this as a workaround:
image


Also, is there a way to exclude a world from being linkified when inside a grand-grand-[…]children of an excluded class?
I’m building a forum with a Documentation category, and I don’t want linkified words in it because it would be redundant: words in other categories will be linkified and will link to topics in this documentation category.
Plus, linkified words open in a new window.

So, this didn’t work:

Here’s an example of the issue I’m facing. This is the part of a text inside a documentation topic.

If I click 22° halo, it will open a page that links to… The same page, at the same place.
I can exclude words in titles, especially because it contains a table of content, but the following paragraphs don’t have any specific class. They are regular paragraphs.


Maybe the excluded class setting could accept CSS selectors?
For example, d-toc-cooked > *


edit: Also I fail to understand why this doesn’t work since my word is a direct child of an excluded class:

image

The “22° Halo” is still linkified:

<h3 id="toc-h3-22-halo" data-d-toc="toc-h3-22-halo" class="d-toc-post-heading">
    <a name="h-22-halo-7" class="anchor" href="#h-22-halo-7"></a>
    <a href="https://discourse.canapin.com/t/ice-halos-information-and-list/28#h-22-halo-7" rel="nofollow" target="_blank" class="linkify-word no-track-link">22° Halo</a>
</h3>

Is that because the table of content HTML code could be generated after the “linkification”?

1 Like

Yeah, that’s probably the issue here, both of these are JavaScript components so you have a race condition and the result will depend on which one runs first.

4 Likes

I added a class in the HTML allowlist so I circumvented the issue.

However, being able to have a button excluded attributes could allow more flexibility to prevent linkify words, using the build-in data-(anything) HTML attribute since it’s allowed by default in Discourse.

Example:
<span data-nolinkify> text </span> text

Would a push request be accepted if I manage to add this to the theme component?

1 Like

This unfortunately prevents linkifying words that explicitly include any of these (especially ., which is handy in linking abbreviations like ID.1 or id2.5).

This is perhaps best fixed in core, but for the meantime here’s a PR that fixes the above issues:

Sample regex rule (will match id1.1 to id10.100 (including variations id m.n, IDm.n and ID m.n)

/(ID¦id)\s?(([1-9]¦10)\.([1-9]¦[1-9][0-9]¦100))/, https://example.com/id$2

@md-misko thanks for the PR!

Are you sure? I am a little surprised by this. I would think that if your regex is greedy enough you should be able to match it. Note removing the the dot from the boundary characters breaks linkifying words at the end of the sentence.

I did a little test with your regex below on regex101.com and it seems to work with the current boundary characters, see regex101: build, test, and debug regex
Note that if I understand your purpose correctly, you may need to turn some of your capturing groups into non-capturing groups with (?:)

The inability of using | is very annoying, agreed. Note that for ID|id part you can just use the i modifier to make the regex case insensitive. For the numbers if you really need the exact range between 1-10 and 1-100 then it’s tricky, relaxing to 1-19 and 1-199 would make it easier. :slight_smile:

Here’s your regex with non-capturing groups that I think should work

/id\s?((?:[1-9]|10)\.(?:[1-9]|[1-9][0-9]|100))/i, https://example.com/id$2

@danekhollas thanks for the feedback and the regex! I’ve changed the code based on your comments, PR is ready for review.

1 Like

Nice! Somebody from the Discourse team will have to review this though, CC @sam

Note that you can simply install the extension from your forked repository (you can even specify a branch).

1 Like

I would like to clarify first, why are users using the component vs built in watched words?

3 Likes

I found two main issues with the built-in watched words

  • cannot add complex regexes: An error occurred: Word is too long (maximum is 100 characters)
  • cannot use arbitrary character as word boundary: namely underscore
  • also inability to edit rules or change order of execution is less then ideal

PR for the component exposes word boundaries to the user, and there are no issues with long regexes (apart from inability to use |, which is also addressed in PR).

Otherwise watched words work perfectly, and If these can be addressed in core I’m all for it.

Made this a separate post following Linkify words in post - #216 by md-misko, not sure if those qualify as bugs:

Watched words doesn’t respect unicode, it treats all unicode characters as word boundaries (when using \b, but this is to be expected I guess).

And more (running some test cases through watched words and found these two):

  • The watched word "\bid\(d+)\b" is an invalid regular expression. (true, but it still adds the rule)
  • \bid\s?(\d+)\b → https://example.com/id$1 linkifies to https://example.com/id%241 (adds urlencoded $1 instead of doing the substitution)

Is substitution not supported or is this a bug?

Has anyone found a workaround to use vertical lines | at this point? I have some regex’s where they are crucial.

Hi, thanks for developing this amazing theme, I loved it so much! Could you release this for all users to use? And not only admin

GitHub - renato/discourse-imgify-words: theme to auto imgify urls in discourse, almost the same as discourse-linkify-works

Hi,

This theme component you linked is an adaptation of Sam’s that I did as a workaround for a need you described in another topic.

They are simple theme components which only change how these words are rendered in a post (Sam’s converts specific words to links, mine converts specific words to images) based on the theme component settings, which are only managed by admins.

The feature you describe can’t be done in a theme component, it would need a plugin to store a per-user set of (word, image url) in the database and the word to image conversion should be done server-side, when building the cooked post (as HTML) content. These can’t be done in a theme component, which is only frontend/client-side code (if you hire someone to do this work it’s critical that they understand these details).

This is out of scope of this theme component and would need a way more involved work. The suggestion is to post the details of how you want this feature to behave on #marketplace and you can hire someone here with prior experience with the Discourse internals to help you.

Thanks for letting me know, I’ll go in that category and talk about it