In this post@codinghorror horror mentioned that on StackOverflow there is some logic that catches users trying to post unformatted code (“there are a lot of unusual [ and { characters per line, etc)”) and then block them from posting until they format it.
He also mentioned that there already was some discussion on writing such a plugin, but I couldn’t find it - so here is a new topic.
Does anyone remember the former posts about this?
If not, I will try to refine the idea a bit, see how SO does it etc to effectively write a spec how this plugin could/should work. (Unfortunately, I have no idea how RoR works etc, so I can’t code anything of it myself.)
So having thought about it a bit, I think these are the possible basic ways to implement it:
Catch the user pasting code and somehow make it formatted (see step #2).
Catch the user posting a topic or reply that contains unformatted code and handle it.
Bot-reply or -PM a user after he posted something with unformatted code.
I think option 3) is out of scope here and more a moderation or community management thing that is probably better be handled manually by moderators or with another plugin. Leaves 1) and 2).
#1 would require monitoring the paste events, analyze their content and then some interface for the next step. Alternatively the code could also automatically be wrapped in the correct formatting, but this adds a lot of complexity (what if someone pastes code into an already existing code block?). But this would also be the most newbie friendly way.
#2 would require analysis of the text to be posted. Again some form of interface would have to be applied to inform the user of his options or, again, some logic could try to add formatting to the post automatically. This would be very convenient for the user, but quite complex to implement (Is this 1 or 2 code samples? Code block or just code in text?)
Step 2
The “Somehow make it formatted”, “handle it” and “interface” refers to the possible variant of the second step of the process. This would probably have to include some “error” or “notice” to the user:
Seems you are trying to post code.
Please apply ``` around it to format as code.
Or select the code and hit the </> button to format it automatically.
(Better wording of course).
Then this could maybe highlight the toolbar button to format as code.
If we decided to handle this at “pasting” time of code, the pasted text could also still be selected in the textarea so clicking the button would be enough.
I would start much easier by adding a new ComposerMessagesFinder which would check for a large number of special characters mostly used in code and sending back to the user a JIT message teaching them how to properly format code.
Trying to automatically format code is going to end is
Yeah, my conclusion as well.
Just wanted to brainstorm the options.
Is this ComposerMessagesFinder at the time of pasting or posting?
Is this “JIT message” already a thing? Is this a private message or some UI element shown to the user?
That looks perfect. It doesn’t block the actual typing and editing, users can click it away and ignore it, there is actually enough space to really explain the task.
Only drawback is that it covers the preview where the actual difference the code formatting make is hidden. Any workaround?
Saving a draft also sounds like a reasonable moment to trigger this.
Ok, so the last thing to define would probably be how to recognize code.
Would make sense, but my Google mojo didn’t turn anything up - it’s somehow pretty hard to google about some helper functionality to format code on StackOverflow @codinghorror, can you help out where to look?
This looks great, but could you please (pretty please with sugar on top) share some more details on how to implement ComposerMessagesFinder for catching users pasting code?
Let us say that I would like just to cover C/C++ so it would just detect semicolons, and { } curly braces. Could you please offer some documentation on that? Should I just modify composer_messages_finder.rb (I am not familiar with Ruby, so please excuse my ignorance, how do I catch “Paste” event in Ruby?
Thanks a lot in advance for any help you can offer.
What was posted here actually were just ideas how this could all go together and form a plugin for this functionality.
What is missing is exactly what you are asking for:
How to find out if a users posts code?
What regex or other technique can or should be used?
How does this work in other places? (StackOverflow was mentioned, but I couldn’t find any additional information about it)
I think you need a score which resets every (x) paragraphs that don’t contain enough characters to meet the threshold. So
Loop through entire body
Look at the current paragraph
Check our current “is this code?” score
Does this paragraph have a higher than expected special character score? Also, is this paragraph shorter than we’d expect?
If we are already in “is this code?” mode, add the number of lines in sequence so far to the current score, to make it greater
If the last (x) paragraphs have zero code, set “is this code?” to zero and record the ending line number
If this is the first paragraph to trigger code mode set “is this code?” to the value of the paragraph score, and record the starting line number
You’d need a sizable test corpus for this to work. I know @zogstrip has one because I’ve seen it
Also
we can realistically limit our check to, say, the “big ten” languages. Per the tags page that would be C#, Java, PHP, JavaScript, Objective-C, C, C++, Python, Ruby.
This is also interesting as an approach
Run your syntax highlighter on the text. If it ends up highlighting some high percentage of it, it’s probably code.
However Random Syntax Highlighting tends to love highlighting numbers in text for no reason. So you’d need to play with that, but it’s a very interesting idea.
What happens when you randomly type text and have it in a syntax highlighting block?
Let's find out. Is this code? I dunno, is it code? You tell me.
I’m in the process of creating a plugin for this, using a naïve (but hopefully effective) pattern matching approach.
My main doubt is actually about the UI, though. It seems Discourse currently has at least 3 (maybe more?) idioms for alerting users before posting:
Yellow overlay on top of rendered markdown. Dismissible, doesn’t prevent posting. Example: welcome message for first couple of posts. Least disruptive.
Orange/red tooltip-like prompt. Prevents posting and jiggles if you try to post before fixing the problem. Example: “post too short” message.
Modal dialog box. Prevents posting. Example: “your post looks like gibberish” message. Most disruptive.
I don’t think the modal is the right tool, as the message will need to be at least a couple of paragraphs long. Then again, the yellow overlay seems too easy to ignore. Possibly a combination of yellow overlay and orange jiggly tooltip could work?
Ideally, it’d also be permanently dismissible (“don’t show me this again”) to avoid annoying power users who intentionally do funky things with their markup that might otherwise trigger the pattern matcher.
Anyone have any thoughts? I’m open to other UI idioms as well, as long as they’re consistent with the general Discourse look and feel.
Here’s the basis of what I have for detection so far:
Code
const codeTypes = [
/(^`{3,}).*\r?\n[\s\S]*\r?\n\1/gm, // backtick-fenced block
/(^~{3,}).*\r?\n[\s\S]*\r?\n\1/gm, // tilde-fenced block
/(?:^|(?:\r?\n{2,}))\s*(?:(?: {4}|\t).*(?:\r?\n|$))/g, // indented block
// lack of `m` flag is intentional (`^` must match beginning of input, not line)
/\[code.*\][\s\S]*\[\/code\]/gm, // BBCode tags
/`.+`/g, // inline backticks (must come last)
];
const varNameStart = '[$_a-zA-Z]';
const varNameEnd = '[$_a-zA-Z0-9]*';
const varName = `${varNameStart}${varNameEnd}`;
const xmlLikeName = '[a-zA-Z-]+';
const nonHtmlIndicators = [
`[$_]${varName}`, // almost certain to be var name
`${varName}(?:_${varName})+`, // snake_case
// camelCase and spinal-case omitted due to too many false positives
'(?:^|\\s+)(?:\\/\\/|[;])', // single-line comment
// ignore python-style `#` single-line comments due to conflict with md headings
`\\/\\*[\\s\\S]+\\*\\/`, // C-like multiline comment
`('''|""")[\\s\\S]+\\1`, // python-like multiline string/comment
';\\s*$', // trailing semicolon
`${varName}\\((?:${varName})?\\)`, // function call
`${varName}\\[(${varName}?)\\]`, // array index
`${varName}\\.${varName}`, // object property
'^\\s*[{}]\\s*$', // curly brace and nothing else on a line
'\\{\\{.+\\}\\}', // templating languages e.g. handlebars
'[$#]\\{.+\\}', // template string
'&&|\\|\\||==|!=|>=|<=|=>|->|>>|<<|::'
+ '|__|!!|\\+\\+|\\+=|-=|\\*=|\\/=|\\|=|&=', // various operators
'\\\\[\'"ntr0\\\\]', // common escape sequences
];
const htmlIndicators = [
'<!--[\\s\\S]*?-->', // xml-like comment
`<${xmlLikeName}.*\\/?>`, // xml-like start/empty tag
`</${xmlLikeName}>`, // xml-like end tag
'&([0-9a-zA-Z]+);$', // html entity - human-readable
'&#([0-9]{1,7});$', // html entity - decimal
'&#x([0-9a-fA-F]{1,6});$', // html entity - hex
];
const indicators = nonHtmlIndicators.concat(INCLUDE_HTML ? htmlIndicators : [])
.map(str => new RegExp(str, 'gm'));
Strip out everything under codeTypes, then check for anything under indicators in the remaining content, and warn if the number of matches is above a preconfigured threshold (defaulting to 0).
I guess I’ll just try to get a working version up and running using native browser alert, then work from there.