Unformatted Code Detector

:discourse2: Summary Unformatted Code Detector detects unformatted code and gives a warning before posting.
:eyeglasses: Preview Preview on theme-creator.discourse.org
:hammer_and_wrench: Repository Link https://github.com/discourse/unformatted-code-detector
:open_book: New to Discourse Themes? Beginner’s guide to using Discourse Themes

Install this theme component

Usage

After installation, users posting unformatted code will see a warning message instructing them how to format it correctly.

Sensitivity and whether it detects HTML are configurable via theme settings.

Debugging

If you receive a warning for a post which doesn’t include any text, you can print debug information by opening the browser JS console, and typing debugUnformattedCodeDetector() Enter. This will print some information about which lines were considered ‘code’, and what the sensitivity settings are.

Issues

  • “Do not show this message again” only works per device, not per user. This is a known issue and will be fixed once Discourse gains the functionality to attach user info from themes.
57 Likes

This is quite essential to our forum due to the fact that many new users ignore our topic about code formatting and most of our topics are about assistance in code. It makes our life much easier by informing them of informatted code. Thanks @lionel-rowe

13 Likes

v0.5 is now out. I’m gonna start documenting changes properly on this topic. I expect most updates from now will be dealing with edge cases.

Changelogs

v0.5

Fix false positive on URLs containing snake case (e.g. https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Client-side_storage)

v0.6

Fix two more types of URL-related false positives:

  • Parens in path (e.g. https://en.wikipedia.org/wiki/LAN_(disambiguation))
  • Array query params (e.g. https://www.example.com?foo[0]=bar&foo[1]=baz)

v0.7

Ignore strings like “O(n)”.

v0.8

  • Ignore emojis (e.g. :slightly_smiling_face: => :slightly_smiling_face:).
  • Ignore special characters for copyright/TM/registered (e.g. Trademarked(TM) => Trademarked™).
  • Ignore anything formatted like a URL.

v0.9 Changelog

  • Ignore anything within BBCode tags ( [quote] s were triggering a false positive).
12 Likes

This is great, thanks for sharing!

3 Likes

Great!

Will this recognize logs as “code”? If not consider this a feature request.

How do you identify an extract from a log-file? I guess it would suffice to look for a date/time at the beginning of two consecutive lines.

Another question: will ir work when people are using quote-formating (>) instead of code fences?

1 Like

It will recognize logs as code if they happen to “look like” code (i.e. contain any of the patterns that would flag a post as containing code). You can test it out on your content at the demo link.

There are lots of ways to format a datetime, and not all log formats start each line with a datetime in any case. I guess detecting full ISO8601 representations (1970-01-01T00:00:00+00:00 etc.) would work, as these are very unlikely to appear outside of code or logs.

Anything in code blocks (fenced or indented) is ignored. Quoted blocks don’t receive any special treatment. Quoted blocks aren’t a correct way to format code, and may lead to unexpected results.

Example: the string <xml />

Block quoted (gets parsed away into oblivion):

Code fences (displays as intended):

<xml />
4 Likes

That’s exactly why I think it shouldn’t be ignored. People keep using quotes to “format” their code…

5 Likes

We at the Home Assistant forums think that this is the best thing invented since sliced bread. Or maybe Home Assistant. Thank you so so much @lionel-rowe!!!

Two minor request:

  • I don’t want to allow users to skip formatting or disable the dialog in the future. I want them to feel pain until they change their ways. I’m sadistic like that. Can you make the “Don’t show this message again” and “Post anyway” buttons optional? For now I’ve hid them with some CSS but would be better to just not include the HTML at all.

  • Unsure if you are doing language detection or not, but if you are, can you add the estimated language name after the first code fence so that users will properly syntax highlight too?

Thanks so much!!

6 Likes

I wouldn’t recommend hiding them, especially if you leave the setting to include HTML detection on. Power users may still want to have their (sanitized) HTML parsed as such, not formatted as code. Two examples where this can be useful are kbd and abbr tags.

If you exclude HTML tags from detection (which may be viable depending on the scope of your forum), hiding the “don’t show this again” would probably be OK. I still wouldn’t recommend hiding the “post anyway”, though, because there are bound to still be some edge cases of false positives (I hit one the other day where I’d omitted a space before an opening parenthesis — poor typesetting, but not unformatted code). Then you’ll have a situation where users can’t post their question at all, and, unless they report the issue to you directly, you won’t even know about it.

Language detection is beyond the scope of this component, I’m afraid. Where possible, it looks for syntactical features shared by many languages, such as lines ending in semicolons, certain configurations of curly braces, and so on.

I am thinking about ways to enhance the UX, though. One big improvement would be to make it much more interactive. For example, instead of a simple modal, the user would be presented with a wizard that first asks them which language their post concerns (select from a dropdown), then a screen which asks them to select which ranges of their post are code (defaulting to lines that contain strings flagged by the plugin), then generating the appropriate markdown. This would still include a “skip and post anyway” option, though, for the reasons I mentioned.

I don’t have a timeline for this change, but it’d be good to know if it’s something people would be interested in.

7 Likes

Already hit the edge cases issue within 30 minutes or so of hiding the elements, so they have been re-added.

I would be super interested in the modal being more interactive!

1 Like

Quick note, we will be official-izing this component soon and working closely with @lionel-rowe and @david to get there. Any ideas or feedback, now is the time to share it!

17 Likes

I tried posting the following:

That’s Japanese. :wink:

Funny bug though.

Then I got a popup saying the post might contain code. I was curious, so I clicked the “Fix Code” button, but nothing actually happened. It’s not like the post even has anything resembling code anyway. So something seems a bit off on this.

(In fact, trying to post this topic made the same popup appear.)

2 Likes

Thanks @seanblue, we made a couple of tweaks based on this feedback. Emojis are now ignored:

And we also changed the “Fix Code” button to say “Edit Post”. The idea is that you should go back and manually fix your post - it will never be fixed automatically.

7 Likes

Awesome news!

Not sure if it makes sense, but would be great if this could also alert on most common Markdown mistakes, that break the formatting.

4 Likes

It would also be great if there was a hint where the suspected unformatted code is.

I was just writing another reply and got the alert, although I haven’t pasted any code. After a while I realized it’s because I used the word topic_id… But it’s not obvious that the detector thinks this word is code (and most people wouldn’t think that) IMO.

I think that when a word has an underscore in it that doesn’t necessarily mean it’s code.

2 Likes

Thanks for all your feedback so far folks! We’ll be adding and tweaking a few settings to reduce the oversensitivity of detection.

@tpetrov one other thing — does the wording of the popup make it clear that you can choose to ignore it and post anyway if you don’t think your post contains code? Or does it make it seem like you’re forced to find and “fix” the perceived problem?

5 Likes

My concern is that a lot of people will not read through it…
You know, when people see a popup with more than one sentence text nowadays, they seem to ignore the text and look for the button Ok (I accept cookies, terms, etc,).

Still, maybe “It looks like your post may contain unformatted code” could be changed to “Do you use code in your post?”, as sometimes questions draw more attention.

I’m not a UX expert, but this button seems a bit nuclear:
image - something I wouldn’t like to click. Which of course is the idea - that people will not simply skip it instead of trying to format their post better.

1 Like

Oooh, I like this idea… but I just got a false positive:

It might have been the broken links that tripped it up? They’re just taken from the templating engine and look like: [keep things civilized](%{guidelines_url}). Or maybe the HTML img tag?

2 Likes

Not a bad idea, @david perhaps we can try changing the modal title to “Are you posting a code snippet?”

I think you’re probably right. Next version will have that as a standard gray button.

Turns out it was both! Next version with default settings will no longer trigger for this post.

6 Likes

We’re rolling out new copy, and building a corpus of positive and negative test sample posts for the component. Bear with us, as this is shaping up nicely!

8 Likes