Catch and educate users posting code without properly formatting it

Sujan · August 13, 2017, 9:29am

In this post @codinghorror horror mentioned that on StackOverflow there is some logic that catches users trying to post unformatted code (“there are a lot of unusual [ and { characters per line, etc)”) and then block them from posting until they format it.

He also mentioned that there already was some discussion on writing such a plugin, but I couldn’t find it - so here is a new topic.

Does anyone remember the former posts about this?

If not, I will try to refine the idea a bit, see how SO does it etc to effectively write a spec how this plugin could/should work. (Unfortunately, I have no idea how RoR works etc, so I can’t code anything of it myself.)

Sujan · August 13, 2017, 9:30am

Here are my Canned Replies to handle this manually, that could be used as starters for the text people are shown:

And some elaborate explanations of how to post (code) on SO:
https://meta.stackexchange.com/a/22189/142000
https://meta.stackexchange.com/help/formatting
https://meta.stackexchange.com/editing-help

Anyone know where to find anything about the feature @codinghorror mentioned?

Sujan · August 14, 2017, 8:32am

@naveedahmada036 posted an example for a Slack plugin that does something related:

Sujan · August 14, 2017, 10:38am

Step 1

So having thought about it a bit, I think these are the possible basic ways to implement it:

Catch the user pasting code and somehow make it formatted (see step #2).
Catch the user posting a topic or reply that contains unformatted code and handle it.
Bot-reply or -PM a user after he posted something with unformatted code.

I think option 3) is out of scope here and more a moderation or community management thing that is probably better be handled manually by moderators or with another plugin. Leaves 1) and 2).

#1 would require monitoring the paste events, analyze their content and then some interface for the next step. Alternatively the code could also automatically be wrapped in the correct formatting, but this adds a lot of complexity (what if someone pastes code into an already existing code block?). But this would also be the most newbie friendly way.

#2 would require analysis of the text to be posted. Again some form of interface would have to be applied to inform the user of his options or, again, some logic could try to add formatting to the post automatically. This would be very convenient for the user, but quite complex to implement (Is this 1 or 2 code samples? Code block or just code in text?)

Step 2

The “Somehow make it formatted”, “handle it” and “interface” refers to the possible variant of the second step of the process. This would probably have to include some “error” or “notice” to the user:

Seems you are trying to post code.
Please apply ``` around it to format as code.
Or select the code and hit the </> button to format it automatically.

(Better wording of course).

Then this could maybe highlight the toolbar button to format as code.

If we decided to handle this at “pasting” time of code, the pasted text could also still be selected in the textarea so clicking the button would be enough.

Anything else to consider?

zogstrip · August 14, 2017, 10:48am

I would start much easier by adding a new ComposerMessagesFinder which would check for a large number of special characters mostly used in code and sending back to the user a JIT message teaching them how to properly format code.

Trying to automatically format code is going to end is

Sujan · August 14, 2017, 10:51am

Yeah, my conclusion as well.
Just wanted to brainstorm the options.

Is this ComposerMessagesFinder at the time of pasting or posting?
Is this “JIT message” already a thing? Is this a private message or some UI element shown to the user?

david · August 14, 2017, 10:56am

The concept already exists - try creating a brand new user on try.discourse.org, make a topic, and you’ll see it.

So you could have something that looks like this:

zogstrip · August 14, 2017, 11:18am

It’s whenever we save a draft.

Sujan · August 14, 2017, 12:45pm

That looks perfect. It doesn’t block the actual typing and editing, users can click it away and ignore it, there is actually enough space to really explain the task.
Only drawback is that it covers the preview where the actual difference the code formatting make is hidden. Any workaround?

Saving a draft also sounds like a reasonable moment to trigger this.

Ok, so the last thing to define would probably be how to recognize code.

Some suggestions:

more than 2 { or } in one paragraph
more than 2 [ or ] in one paragraph
more than 2 ; in one paragraph
more than 5 \t (tab characters) in one paragraph

More?

zogstrip · August 14, 2017, 1:04pm

Any chances you could “steal” what stackoverflow is doing? They probably put a lot of thinking into this. No point in re-inventing the wheel.

Sujan · August 14, 2017, 1:06pm

Would make sense, but my Google mojo didn’t turn anything up - it’s somehow pretty hard to google about some helper functionality to format code on StackOverflow @codinghorror, can you help out where to look?

tm2017 · September 11, 2017, 7:06pm

This looks great, but could you please (pretty please with sugar on top) share some more details on how to implement ComposerMessagesFinder for catching users pasting code?

Let us say that I would like just to cover C/C++ so it would just detect semicolons, and { } curly braces. Could you please offer some documentation on that? Should I just modify composer_messages_finder.rb (I am not familiar with Ruby, so please excuse my ignorance, how do I catch “Paste” event in Ruby?

Thanks a lot in advance for any help you can offer.

Sujan · September 12, 2017, 11:42am

What was posted here actually were just ideas how this could all go together and form a plugin for this functionality.

What is missing is exactly what you are asking for:
How to find out if a users posts code?
What regex or other technique can or should be used?
How does this work in other places? (StackOverflow was mentioned, but I couldn’t find any additional information about it)

codinghorror · October 13, 2017, 10:13pm

I think you need a score which resets every (x) paragraphs that don’t contain enough characters to meet the threshold. So

Loop through entire body
Look at the current paragraph
Check our current “is this code?” score
Does this paragraph have a higher than expected special character score? Also, is this paragraph shorter than we’d expect?
If we are already in “is this code?” mode, add the number of lines in sequence so far to the current score, to make it greater
If the last (x) paragraphs have zero code, set “is this code?” to zero and record the ending line number
If this is the first paragraph to trigger code mode set “is this code?” to the value of the paragraph score, and record the starting line number

You’d need a sizable test corpus for this to work. I know @zogstrip has one because I’ve seen it

Also

we can realistically limit our check to, say, the “big ten” languages. Per the tags page that would be C#, Java, PHP, JavaScript, Objective-C, C, C++, Python, Ruby.

This is also interesting as an approach

Run your syntax highlighter on the text. If it ends up highlighting some high percentage of it, it’s probably code.

However Random Syntax Highlighting tends to love highlighting numbers in text for no reason. So you’d need to play with that, but it’s a very interesting idea.

What happens when you randomly type text and have it in a syntax highlighting block? 

Let's find out. Is this code? I dunno, is it code? You tell me.

codinghorror · August 4, 2018, 8:29am

FYI @erlend_sh if you are looking for encouragement projects this one is extremely strong.

gdpelican · August 4, 2018, 2:47pm

For reference, here’s a kitchen sink dump of answers from Jeff asking this same question about implementation for SO back in 2011.

codinghorror · February 14, 2019, 11:04pm

@Jose_C_Gomez this isn’t actually a plugin. It’s just a theory with some proposed implementation logic.

lionel-rowe · March 19, 2019, 6:15am

I’m in the process of creating a plugin for this, using a naïve (but hopefully effective) pattern matching approach.

My main doubt is actually about the UI, though. It seems Discourse currently has at least 3 (maybe more?) idioms for alerting users before posting:

Yellow overlay on top of rendered markdown. Dismissible, doesn’t prevent posting. Example: welcome message for first couple of posts. Least disruptive.
Orange/red tooltip-like prompt. Prevents posting and jiggles if you try to post before fixing the problem. Example: “post too short” message.
Modal dialog box. Prevents posting. Example: “your post looks like gibberish” message. Most disruptive.

I don’t think the modal is the right tool, as the message will need to be at least a couple of paragraphs long. Then again, the yellow overlay seems too easy to ignore. Possibly a combination of yellow overlay and orange jiggly tooltip could work?

Ideally, it’d also be permanently dismissible (“don’t show me this again”) to avoid annoying power users who intentionally do funky things with their markup that might otherwise trigger the pattern matcher.

Anyone have any thoughts? I’m open to other UI idioms as well, as long as they’re consistent with the general Discourse look and feel.

codinghorror · March 19, 2019, 7:48am

I would try to get detection down reliably first before agonizing too much over UI.

lionel-rowe · March 20, 2019, 3:59pm

Here’s the basis of what I have for detection so far:

Code

const codeTypes = [
  /(^`{3,}).*\r?\n[\s\S]*\r?\n\1/gm, // backtick-fenced block
  /(^~{3,}).*\r?\n[\s\S]*\r?\n\1/gm, // tilde-fenced block
  /(?:^|(?:\r?\n{2,}))\s*(?:(?: {4}|\t).*(?:\r?\n|$))/g, // indented block
  // lack of `m` flag is intentional (`^` must match beginning of input, not line)

  /\[code.*\][\s\S]*\[\/code\]/gm, // BBCode tags

  /`.+`/g, // inline backticks (must come last)
];

const varNameStart = '[$_a-zA-Z]';
const varNameEnd = '[$_a-zA-Z0-9]*';
const varName = `${varNameStart}${varNameEnd}`;
const xmlLikeName = '[a-zA-Z-]+';

const nonHtmlIndicators = [
  `[$_]${varName}`, // almost certain to be var name
  `${varName}(?:_${varName})+`, // snake_case
  // camelCase and spinal-case omitted due to too many false positives
  '(?:^|\\s+)(?:\\/\\/|[;])', // single-line comment
  // ignore python-style `#` single-line comments due to conflict with md headings
  `\\/\\*[\\s\\S]+\\*\\/`, // C-like multiline comment
  `('''|""")[\\s\\S]+\\1`, // python-like multiline string/comment
  ';\\s*$', // trailing semicolon
  `${varName}\\((?:${varName})?\\)`, // function call
  `${varName}\\[(${varName}?)\\]`, // array index
  `${varName}\\.${varName}`, // object property
  '^\\s*[{}]\\s*$', // curly brace and nothing else on a line
  '\\{\\{.+\\}\\}', // templating languages e.g. handlebars
  '[$#]\\{.+\\}', // template string
  '&&|\\|\\||==|!=|>=|<=|=>|->|>>|<<|::'
  + '|__|!!|\\+\\+|\\+=|-=|\\*=|\\/=|\\|=|&=', // various operators
  '\\\\[\'"ntr0\\\\]', // common escape sequences
];

const htmlIndicators = [
  '<!--[\\s\\S]*?-->', // xml-like comment
  `<${xmlLikeName}.*\\/?>`, // xml-like start/empty tag
  `</${xmlLikeName}>`, // xml-like end tag
  '&([0-9a-zA-Z]+);$', // html entity - human-readable
  '&#([0-9]{1,7});$', // html entity - decimal
  '&#x([0-9a-fA-F]{1,6});$', // html entity - hex
];

const indicators = nonHtmlIndicators.concat(INCLUDE_HTML ? htmlIndicators : [])
.map(str => new RegExp(str, 'gm'));

Strip out everything under codeTypes, then check for anything under indicators in the remaining content, and warn if the number of matches is above a preconfigured threshold (defaulting to 0).

I guess I’ll just try to get a working version up and running using native browser alert, then work from there.

Topic		Replies	Views
Unformatted Code Detector Theme component official , unformatted-code-detector	24	8627	July 30, 2024
Users pasting unformatted code results in unwanted styles Support	10	2122	June 28, 2021
Prompt to encourage users to post text instead of screenshots of code Feature unformatted-code-detector	0	6	October 25, 2023
Easy way to change preformatted text (code) button? Feature	41	15263	June 27, 2016
Suggest users to use code block when pasting code snippet Support	3	39	May 21, 2025

Catch and educate users posting code without properly formatting it

Step 1

Step 2

Related topics