Свойство CSS `white-space` для данных буфера обмена не учитывается при вставке в WYSIWYG-редактор

Приоритет/Серьезность:

Средний

Платформа:

Операционная система

  • Windows 11

Браузер

  • Google Chrome 139.0.7258.128

Discourse

028c90dd5e7a2799ea5b6e963f71fc0222681943

Описание:

Текст, скопированный из некоторых источников, может храниться в буфере обмена в отформатированном виде (тип text/html) в дополнение к обычному тексту (тип text/plain).

При вставке текста в редактор, если в буфере обмена присутствует форматированный тип данных, используется именно он, а не обычный текст.

По умолчанию пробелы в HTML-контенте сжимаются. Это поведение можно контролировать с помощью CSS-свойства white-space.

:bug: При вставке в редактор в режиме «редактор с форматированием» свойство white-space CSS для данных из буфера обмена не учитывается. В результате пробелы всегда сжимаются в вставленном содержимом. В случаях, когда исходный контент имел свойство white-space, установленное в значение pre, это приводит к тому, что вставленный текст становится трудно читаемым и некорректным в ситуациях, когда пробелы в исходном контенте имели техническое значение.

Шаги для воспроизведения:

  1. Создайте HTML-файл со следующим содержимым:
    <html>
      <body>
        <span style="white-space: pre">foo
    bar
        </span>
      </body>
    </html>
    
  2. Откройте файл в вашем веб-браузере.
    Обратите внимание, что пробелы в содержимом страницы не сжимаются:
    foo
    bar
    
  3. Скопируйте содержимое веб-страницы.
  4. Откройте редактор публикации.
  5. Переключите редактор в режим «редактор с форматированием».
  6. Вставьте скопированное содержимое.

:bug: Вместо сохранения того же формата, что и у скопированного содержимого, пробелы вставленного текста были сжаты:

foo bar

Дополнительная информация:

Я вижу, что ProseMirror поддерживает white-space: pre:


Ошибка не возникает при использовании редактора в режиме «редактор Markdown».


Ошибка не возникает, если содержимое вставляется в блок кода вместо обычного режима редактора. Действительно, во многих случаях наиболее уместно размещать контент, использующий что-то вроде white-space: pre, внутри блока кода. Однако довольно распространена ситуация, когда пользователи применяют форматирование постфактум: добавляют контент в редактор, выделяют его, а затем используют панель инструментов редактора для применения форматирования (в отличие от альтернативного подхода — создания блока кода до добавления контента).


Я обнаружил полезный инструмент для просмотра необработанных данных содержимого буфера обмена:


Мне удалось воспроизвести ошибку на try.discourse.org в режиме «безопасный режим».

Связанные темы

2 лайка

Did you put the post composer into “rich text editor” mode before you pasted the content copied from the web page?

The fault still occurs.

Are you sure you followed the instructions exactly as written?

Please note that you must copy the content that is rendered from that HTML, so that the clipboard content is populated with text/html type data:

<html>
<body>
<!--StartFragment--><span style="color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: pre; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">foo
bar
    </span><!--EndFragment-->
</body>
</html>

This is not about composing your post using HTML markup.

Ah good catch. I have a tendency to skim the post a bit too fast :sweat_smile:

1 лайк

Thanks for reporting @per1234, we are having a look at this.

Understand the general issue here, we want to make it as easy as possible for people to paste in code examples.

2 лайка

What would you expect from such an HTML clipboard?

foo
bar

Or, considering it’s a span tag, two inline code lines with a hard break between?

foo
bar

Or just that we respect the line breaks but in a regular paragraph, with a hard break between?

foo
bar

Thank you!

I’m not very knowledgeable in the subject of HTML, but I would expect this rendering:

From what I can tell, this is how Chrome browser renders it.


That said, in the specific use in which I encountered the problem, it is true that the code block rendering would be most appropriate. We get this type of clipboard content by clicking the “Copy Console Output” button in an online IDE named “Arduino Cloud Editor”:

This copies the output produced by the compiler and other tooling to the clipboard. This type of non-prose content is best formatted as a code block.

If the following procedure is used to share that copied output in a forum post:

  1. Put post composer into “rich text editor” mode.
  2. Paste the content into the composer.
  3. Select the pasted content.
  4. Click the </> icon on the composer toolbar.

The post ends up with the following formatting:

/run/arduino/sketches/asdf/asdf.ino:1:2: error: #error foo  #error foo   ^~~~~

(note that all the copied content is on a single line)

whereas we would expect this post formatting:

/run/arduino/sketches/asdf/asdf.ino:1:2: error: #error foo
 #error foo
  ^~~~~

However, this preference for a code block is specific to our particular use case. It might be that in other use cases there are sources of clipboard content with a white-space: pre property for which a code block would not be appropriate. And even for our use case it is reasonable to put the responsibility of manually applying code block formatting on the user.

1 лайк

In this case, does it still use a span tag in its text/html clipboard output, or does it only output plain/text?

If I use the “Clipboard Inspector” tool to check what data is in my clipboard after I click that “Copy Console Output” button in Arduino Cloud Editor, it shows it contains the following “text/plain” type data:

/run/arduino/sketches/asdf/asdf.ino:1:2: error: #error foo
 #error foo
  ^~~~~

as well as the following “text/html” type data:

<span style="color: rgb(0, 0, 0); font-family: &quot;Open Sans&quot;, &quot;Lucida Grande&quot;, lucida, verdana, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: 0.16px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: pre; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">/run/arduino/sketches/asdf/asdf.ino:1:2: error: #error foo
 #error foo
  ^~~~~</span>

I hope that answers your question. I’m happy to provide additional information if needed.

This should be fixed by FIX: [rich editor] convert newlines to hard breaks when parsed from HTML by renato · Pull Request #35518 · discourse/discourse · GitHub (not merged yet, it’s waiting a code review).

My first take was to convert it to a code block, but I think that would be too eager and cause some false positives. Instead, we just respect line breaks converting them to hard breaks within the context the HTML was pasted. (Thanks to Marijn’s improvement to prosemirror-model: When preserving whitespace, replace newlines with line break replacem… · ProseMirror/prosemirror-model@79e9f2b · GitHub)

With the recent improvements to the code toolbar button, users should be able to select this pasted section with the hard breaks and convert it to a code block, and the newlines should be carried over.

2 лайка

Thanks so much for the fix @renato, and for taking the time to post an update here!

The recent bug fixes have brought the rich text editor’s functionality to the point where it can serve to make our forum more approachable to less technical users who aren’t already familiar with Markdown and not motivated to learn it.


There still are a couple of conditions under which the results are not expected, but these are things that are not reasonable to mitigate via the Discourse codebase:

Corruption due to incidental markup syntax

Posts may be corrupted in the case where there is content that incidentally resembles markup. This is due to the intentional decision to support markup in the rich text editor.

For our use case where those who wish to use markup are expected to use the Markdown editor, while the rich text editor is intended only for use by those with no interest in using markup, this is a very unfortunate decision. One of the most significant problems we have with non-technical users using the Markdown editor is post corruption due to incidental markup and I had great hopes that the rich text editor would provide a solution for that. However, for the use case where a forum will only provide a rich text editor, this design makes perfect sense as it still allows users fluent in Markdown to efficiently compose posts.

Incorrect formatting due to inappropriate markup in clipboard content

We have a case where the “text/html” type content added to the clipboard when copying from a specific application has inappropriate HTML markup, which results in incorrect formatting when the content is pasted into the rich text editor outside a code block.

This is of course a bug in the application and Discourse is acting 100% correctly by formatting the content as indicated by the markup.

1 лайк

Thanks heaps @per1234

Can you expand a bit on examples where corruption can occur? We still have a few edge cases around nodes that we do not know how to render, but we try to ban switching to the rich editor in cases like this.

Regarding clipboard, we certainly want to improve. It is is a tough problem any exact repros here would be very helpful.

Sure. I’m glad if the information can be useful. I would like to reiterate my previous statement:

However, I’d be happy to be wrong about that :slightly_smiling_face:.

  1. Copy the following C++ code:
    #include <iostream>
    int main() {
      std::cout << __FILE__;
    }
    
  2. Open the post composer.
  3. Put the composer into the “rich text editor” mode.
  4. Paste the copied content into the composer.

:slightly_frowning_face: The content is corrupted:

#include
int main() {
std::cout << FILE;
}

(note that the <iostream> has been suppressed due to resembling an unsupported HTML tag, and the __FILE__ has been treated as bold markup)

This could be seen as user error, since it could be avoided by triggering a code block prior to pasting the non-prose content. However, we might expect that the alternative workflow of applying code block formatting retroactively to pasted content would be equally valid (as it is when using the Markdown editor).

Equipment

  • Any Arduino board (whether official or 3rd party)

Instructions

  1. Install Arduino IDE 2.3.6, which can be downloaded from the “Software” page of the Arduino website:
    https://www.arduino.cc/en/software/#ide-download-section
  2. Start Arduino IDE.
  3. Select File > New Sketch from the Arduino IDE menus.
  4. Replace the content of the new sketch with the following code:
    void setup() {
      Serial.begin(9600);
      while (!Serial) {}  // Wait for serial port to be opened.
      delay(500);         // Some boards require a delay after serial port initialization.
      Serial.println("foo");
      Serial.println("bar");
    }
    void loop() {}
    
  5. Select Tools > Serial Monitor from the Arduino IDE menus to open the Serial Monitor view, if it is not already open.
  6. Select “9600” from the baud rate menu in the Serial Monitor view.
  7. Upload the sketch to your Arduino board.
  8. Select the serial output from the field in the Serial Monitor view.
  9. Copy the selected content.
  10. Open the Discourse post composer.
  11. Put the composer into the “rich text editor” mode.
  12. Paste the copied content into the composer.

:slightly_frowning_face: Each line of the copied content is placed in a separate code block:

foo

bar

If you inspect the clipboard content, you will see that, in addition to the expected “text/plain” type content:

foo
bar

It also contains the following “text/html” type content:

<div style="color: rgb(78, 91, 97); font-family: monospace; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: nowrap; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; left: 0px; top: 0px; height: 18px; width: 1862px;"><pre style="margin: 0px;">foo
</pre></div><div style="color: rgb(78, 91, 97); font-family: monospace; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: nowrap; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; left: 0px; top: 18px; height: 18px; width: 1862px;"><pre style="margin: 0px;">bar</pre></div>

Since the Arduino IDE 2.x Serial Monitor incorrectly wraps each line of the “text/html” type copied content in <pre> tags, the rendering of each line of the pasted content as a separate code block by the Discourse rich text editor is correct and expected.

As with the other problem I described above, the unexpected formatting can be avoided by proactively triggering code block formatting prior to pasting the content.

2 лайка

We parse pasted plain/text as Markdown, which is expected and not doing it would be a worse experience IMO, but any practical suggestions are welcome. Maybe supporting the SHIFT modifier as a “paste without parsing as Markdown” could help?

This can be changed, one possibility we have is to escape it to <iostream> instead of stripping.


I believe the other point you raised isn’t actionable as you mentioned.

Is there anything else that you still struggle with related to this topic?