[Plugin] Catch and educate users posting code without properly formatting it

(Jan P.) #1

In this post @codinghorror horror mentioned that on StackOverflow there is some logic that catches users trying to post unformatted code (“there are a lot of unusual [ and { characters per line, etc)”) and then block them from posting until they format it.

He also mentioned that there already was some discussion on writing such a plugin, but I couldn’t find it - so here is a new topic.

Does anyone remember the former posts about this?

If not, I will try to refine the idea a bit, see how SO does it etc to effectively write a spec how this plugin could/should work. (Unfortunately, I have no idea how RoR works etc, so I can’t code anything of it myself.)

How to optimize issue/support ticket like workflow?
(Jan P.) #2

Here are my Canned Replies to handle this manually, that could be used as starters for the text people are shown:

And some elaborate explanations of how to post (code) on SO:

Anyone know where to find anything about the feature @codinghorror mentioned?

(Jan P.) #3

@naveedahmada036 posted an example for a Slack plugin that does something related:

(Jan P.) #4

Step 1

So having thought about it a bit, I think these are the possible basic ways to implement it:

  1. Catch the user pasting code and somehow make it formatted (see step #2).
  2. Catch the user posting a topic or reply that contains unformatted code and handle it.
  3. Bot-reply or -PM a user after he posted something with unformatted code.

I think option 3) is out of scope here and more a moderation or community management thing that is probably better be handled manually by moderators or with another plugin. Leaves 1) and 2).

#1 would require monitoring the paste events, analyze their content and then some interface for the next step. Alternatively the code could also automatically be wrapped in the correct formatting, but this adds a lot of complexity (what if someone pastes code into an already existing code block?). But this would also be the most newbie friendly way.

#2 would require analysis of the text to be posted. Again some form of interface would have to be applied to inform the user of his options or, again, some logic could try to add formatting to the post automatically. This would be very convenient for the user, but quite complex to implement (Is this 1 or 2 code samples? Code block or just code in text?)

Step 2

The “Somehow make it formatted”, “handle it” and “interface” refers to the possible variant of the second step of the process. This would probably have to include some “error” or “notice” to the user:

Seems you are trying to post code.
Please apply ``` around it to format as code.
Or select the code and hit the </> button to format it automatically.

(Better wording of course).

Then this could maybe highlight the toolbar button to format as code.

If we decided to handle this at “pasting” time of code, the pasted text could also still be selected in the textarea so clicking the button would be enough.

Anything else to consider?

(Régis Hanol) #5

I would start much easier by adding a new ComposerMessagesFinder which would check for a large number of special characters mostly used in code and sending back to the user a JIT message teaching them how to properly format code.

Trying to automatically format code is going to end is :cry:

(Jan P.) #6

Yeah, my conclusion as well.
Just wanted to brainstorm the options.

Is this ComposerMessagesFinder at the time of pasting or posting?
Is this “JIT message” already a thing? Is this a private message or some UI element shown to the user?

(David Taylor) #7

The concept already exists - try creating a brand new user on try.discourse.org, make a topic, and you’ll see it.

So you could have something that looks like this:

(Régis Hanol) #8

It’s whenever we save a draft.

(Jan P.) #9

That looks perfect. It doesn’t block the actual typing and editing, users can click it away and ignore it, there is actually enough space to really explain the task.
Only drawback is that it covers the preview where the actual difference the code formatting make is hidden. Any workaround?

Saving a draft also sounds like a reasonable moment to trigger this.

Ok, so the last thing to define would probably be how to recognize code.

Some suggestions:

  • more than 2 { or } in one paragraph
  • more than 2 [ or ] in one paragraph
  • more than 2 ; in one paragraph
  • more than 5 \t (tab characters) in one paragraph


(Régis Hanol) #10

Any chances you could “steal” what stackoverflow is doing? They probably put a lot of thinking into this. No point in re-inventing the wheel.

(Jan P.) #11

Would make sense, but my Google mojo didn’t turn anything up - it’s somehow pretty hard to google about some helper functionality to format code on StackOverflow :wink: @codinghorror, can you help out where to look?


This looks great, but could you please (pretty please with sugar on top) share some more details on how to implement ComposerMessagesFinder for catching users pasting code?

Let us say that I would like just to cover C/C++ so it would just detect semicolons, and { } curly braces. Could you please offer some documentation on that? Should I just modify composer_messages_finder.rb (I am not familiar with Ruby, so please excuse my ignorance, how do I catch “Paste” event in Ruby?

Thanks a lot in advance for any help you can offer.

(Jan P.) #13

What was posted here actually were just ideas how this could all go together and form a plugin for this functionality.

What is missing is exactly what you are asking for:
How to find out if a users posts code?
What regex or other technique can or should be used?
How does this work in other places? (StackOverflow was mentioned, but I couldn’t find any additional information about it)

(Jeff Atwood) #14

I think you need a score which resets every (x) paragraphs that don’t contain enough characters to meet the threshold. So

  1. Loop through entire body
  2. Look at the current paragraph
  3. Check our current “is this code?” score
  4. Does this paragraph have a higher than expected special character score? Also, is this paragraph shorter than we’d expect?
  5. If we are already in “is this code?” mode, add the number of lines in sequence so far to the current score, to make it greater
  6. If the last (x) paragraphs have zero code, set “is this code?” to zero and record the ending line number
  7. If this is the first paragraph to trigger code mode set “is this code?” to the value of the paragraph score, and record the starting line number

You’d need a sizable test corpus for this to work. I know @zogstrip has one because I’ve seen it :wink:


we can realistically limit our check to, say, the “big ten” languages. Per the tags page that would be C#, Java, PHP, JavaScript, Objective-C, C, C++, Python, Ruby.

This is also interesting as an approach

Run your syntax highlighter on the text. If it ends up highlighting some high percentage of it, it’s probably code.

However Random Syntax Highlighting tends to love highlighting numbers in text for no reason. So you’d need to play with that, but it’s a very interesting idea.

What happens when you randomly type text and have it in a syntax highlighting block? 

Let's find out. Is this code? I dunno, is it code? You tell me.

(Jeff Atwood) #15

FYI @erlend_sh if you are looking for encouragement projects this one is extremely strong.

(James Kiesel) #16

For reference, here’s a kitchen sink dump of answers from Jeff asking this same question about implementation for SO back in 2011.