HTML/RTF pasting

This is really really nice, an excellent improvement, @vinothkannans great work :heart_eyes:

6 Likes

Oh, wow, @vinothkannans ā€“ this is amazing work! :clap::heart_eyes::clap::heart_eyes::clap:

I got better results out of regular webpages than out of Google Docs, which would be a likely source of content for me. I donā€™t know what witchcraft they use over there.

Here is my test document: Testing Markdown paste - Google Docs

The following are converted:

  • headings
  • paragraphs
  • links
  • lists
  • images

The following are not converted:

  • inline formatting (italics, bold)
  • nested lists
  • tables

Feel free to mangle the document I linked to above to test other cases.

7 Likes

No. Way. This seemed like a pipe dream not very long ago. Having a few more hands on deck is pretty great!

Nice work, @vinothkannans!

7 Likes

amazing. nice job. I canā€™t wait to share the good news with my community. what a gift. :gift_heart:

7 Likes

There is a problem pasting from Word:

Document:

Pasted:

My Header

This is the table Iā€™m talking about:

Header 1

Header 2

Bold

ā“­

Italics

Yellow BG

Underline

RED

Hereā€™s a list:

Potato

Potato

Potato

Plain Text in clipboard

My Header
This is the table Iā€™m talking about:
Header 1	Header 2
Bold	ā“­
Italics	Yellow BG
Underline	RED

Hereā€™s a list:
1.	Potato
2.	Potato
3.	Potato

HTML in clipboard

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
...... many lines deleted .......
</head>

<body lang=EN-US style='tab-interval:.5in'>
<!--StartFragment-->

<h1>My Header<o:p></o:p></h1>

<p class=MsoNormal>This is the table Iā€™m talking about:<o:p></o:p></p>

<table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width=147
 style='width:110.0pt;border-collapse:collapse;mso-yfti-tbllook:1184;
 mso-padding-alt:0in 5.4pt 0in 5.4pt'>
 <tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes;height:14.25pt'>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border:solid black 1.0pt;
  mso-border-top-alt:1.0pt;mso-border-left-alt:.5pt;mso-border-bottom-alt:1.0pt;
  mso-border-right-alt:.5pt;mso-border-color-alt:black;mso-border-style-alt:
  solid;background:black;padding:0in 5.4pt 0in 5.4pt;height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><b><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:white'>Header 1<o:p></o:p></span></b></p>
  </td>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border:solid black 1.0pt;
  border-left:none;mso-border-left-alt:solid black .5pt;mso-border-top-alt:
  1.0pt;mso-border-left-alt:.5pt;mso-border-bottom-alt:1.0pt;mso-border-right-alt:
  .5pt;mso-border-color-alt:black;mso-border-style-alt:solid;background:black;
  padding:0in 5.4pt 0in 5.4pt;height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><b><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:white'>Header 2<o:p></o:p></span></b></p>
  </td>
 </tr>
 <tr style='mso-yfti-irow:1;height:14.25pt'>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border:solid black 1.0pt;
  border-top:none;mso-border-top-alt:solid black .5pt;mso-border-alt:solid black .5pt;
  background:#D9D9D9;padding:0in 5.4pt 0in 5.4pt;height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><b><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:black'>Bold<o:p></o:p></span></b></p>
  </td>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border-top:none;
  border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  mso-border-top-alt:solid black .5pt;mso-border-left-alt:solid black .5pt;
  mso-border-alt:solid black .5pt;background:#D9D9D9;padding:0in 5.4pt 0in 5.4pt;
  height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:black'>ā“­<o:p></o:p></span></p>
  </td>
 </tr>
 <tr style='mso-yfti-irow:2;height:14.25pt'>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border:solid black 1.0pt;
  border-top:none;mso-border-top-alt:solid black .5pt;mso-border-alt:solid black .5pt;
  padding:0in 5.4pt 0in 5.4pt;height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><i><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:black'>Italics<o:p></o:p></span></i></p>
  </td>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border-top:none;
  border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  mso-border-top-alt:solid black .5pt;mso-border-left-alt:solid black .5pt;
  mso-border-alt:solid black .5pt;background:yellow;padding:0in 5.4pt 0in 5.4pt;
  height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:black'>Yellow BG<o:p></o:p></span></p>
  </td>
 </tr>
 <tr style='mso-yfti-irow:3;mso-yfti-lastrow:yes;height:14.25pt'>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border:solid black 1.0pt;
  border-top:none;mso-border-top-alt:solid black .5pt;mso-border-alt:solid black .5pt;
  mso-border-bottom-alt:solid black 1.0pt;background:#D9D9D9;padding:0in 5.4pt 0in 5.4pt;
  height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><u><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:black'>Underline<o:p></o:p></span></u></p>
  </td>
  <td width=73 nowrap valign=bottom style='width:55.0pt;border-top:none;
  border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  mso-border-top-alt:solid black .5pt;mso-border-left-alt:solid black .5pt;
  mso-border-alt:solid black .5pt;mso-border-bottom-alt:solid black 1.0pt;
  background:#D9D9D9;padding:0in 5.4pt 0in 5.4pt;height:14.25pt'>
  <p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;line-height:
  normal'><span style='mso-ascii-font-family:Calibri;mso-fareast-font-family:
  "Times New Roman";mso-hansi-font-family:Calibri;mso-bidi-font-family:Calibri;
  color:red'>RED<o:p></o:p></span></p>
  </td>
 </tr>
</table>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'><i style='mso-bidi-font-style:
normal'>Hereā€™s a list:<o:p></o:p></i></b></p>

<p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span
style='mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><![endif]>Potato<o:p></o:p></p>

<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span
style='mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><![endif]>Potato<o:p></o:p></p>

<p class=MsoListParagraphCxSpLast style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span
style='mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><![endif]>Potato<o:p></o:p></p>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<!--EndFragment-->
</body>

</html>

So it doesnā€™t seem to be working because Microsoft Word doesnā€™t convert bullet/numbered lists to <ol> or <ul> tags.

How is this in scope in any form? Colors, etc, are not Markdown features. If you want that, screenshot and paste an image.

Just threw them in for testing. I believe BOLD and italics are MarkDown.

There is a plugin that will give font colors with [color] codes. Not official, I know, but would be nice.

Also, Iā€™m just pointing out to the fact that lists in Word doesnā€™t come through, which is definitely in scope. Not a fault of the system but Word (it doesnā€™t translate lists to the appropriate HTML tags), but still needs handling just because it is Office, you know.

EDIT: And the table embedded in Word is not converted correctly also. But this is already addressed:

Yeah, I can repro ā€“ a list in Word doesnā€™t come through as a list, neither bulleted nor numbered.

Paragraph

1.      This

2.      Is a

3.      Numbered list

Paragraph

-         This

-         Is a

-         Bulleted list

Paragraph

Would be good to preserve lists from Word pastes, I can agree with that @vinothkannans ā€“ the issue is the hidden tab character that Word is inserting here:

image

4 Likes

Okay I will look at these issues :+1:

5 Likes

How did you get the plain text to paste?

When I paste, I get this:

This is a list

1.      
Potato

2.      
Potato

3.      
Potato

because it is using the HTML version instead of the plain-text version, and Wordā€™s HTML version converts lists to <p> tags.

Pasting the plain-text version in the clipboard will get this:

This is a list

  1. Potato
  2. Potato
  3. Potato

which works fine even with the tab characters.

1 Like

Test 1 (select 3 paragraph and delete 1 + 3 in reply):

This is a list 1. Potato 2. Potato 3. Potato

Test 2 (select only text in code):

This is a list 1. Potato 2. Potato 3. Potato

Canā€™t tell if it was always like this but pasting CSS straight from browser devtools seems to be affected.

What it looks like

.select-box-kit.is-expanded .select-kit-body, .select-kit.is-expanded .select-kit-body {

  1. display: -webkit-box;
  2. display: -ms-flexbox;
  3. display: flex;
  4. -webkit-box-orient: vertical;
  5. -webkit-box-direction: normal;
  6. -ms-flex-direction: column;
  7. flex-direction: column;
  8. left: 0;
  9. position: absolute;
  10. top: 0;

}

What it's supposed to look like

.select-box-kit.is-expanded .select-kit-body, .select-kit.is-expanded .select-kit-body {
display: -webkit-box;
display: -ms-flexbox;
display: flex;
-webkit-box-orient: vertical;
-webkit-box-direction: normal;
-ms-flex-direction: column;
flex-direction: column;
left: 0;
position: absolute;
top: 0;
}

Copying code from StackOverflow (which you should never do :upside_down_face: ) is affected.

Steps to reproduce:

1- Go to this answer
2- Copy the script part.
3- Paste in composer.

Result:

function makeUnselectable(node) { if (node.nodeType == 1) { node.setAttribute("unselectable", "on"); } var child = node.firstChild; while (child) { makeUnselectable(child); child = child.nextSibling; }
} makeUnselectable(document.getElementById("foo"));

Expected result (works if you use paste as plain text or ctrl + shift + v ):

function makeUnselectable(node) {
    if (node.nodeType == 1) {
        node.setAttribute("unselectable", "on");
    }
    var child = node.firstChild;
    while (child) {
        makeUnselectable(child);
        child = child.nextSibling;
    }
}

makeUnselectable(document.getElementById("foo"));

Chrome Version 63.0.3239.84 (Official Build) (64-bit) - latest
Win 7

4 Likes

We are going to put HTML and Excel table rich paste behind a feature flag for one more week at least while we refine it. Default off. It will remain enabled on meta though.

Helps us relieve some of the pressure. Then in a week or so we can decide if we want to include this in 1.9 or not.

9 Likes

I donā€™t agree with part of this ā€“ the excel table paste should be 100% safe and should make the 1.9 release. There is just no way that excel data could be interpreted as anything else.

I do tend to agree the HTML part is way too risky to take on at this time, though.

So letā€™s make the feature flag about the HTML paste, and push the Excel table paste throughā€¦ since thatā€™s what this was originally about before it got all scope-creeped up :wink:

13 Likes

Iā€™d like to add another pasting feature request:
Automatically remove line breaks

Sometimes Iā€™ve to copy & paste text from PDF docs. Maybe there is some way to detect unwanted characters "- "and repair the words :slight_smile: Various JS-based online tools didnā€™t work for me.

Speaking of feature creep, it would be nice if you could paste without magical formatting. Could shift-paste just paste like the good old days?

1 Like

You can disable rich text pasting from the site setting enable_rich_text_paste. Yes always you can use shift-paste to plain text pasting.

4 Likes

CTRL+SHIFT+V works for me.

2 Likes

On one hand, well, of course. On the other, this continues to be just awesome. I havenā€™t used this enough to have tried.