Improper HTML escaping in bot view

I noticed this on @y2k’s site, https://army.community/, and have since seen it on some other sites, like https://discuss.codecademy.com/.

I visited Y2k’s site in lynx, to check connectivity (turns out it was down). Later when it came back up, I visited again in lynx and saw HTML tags on the page. Tags that were not visible when I visited in a browser that knows Javascript.

Here’s what I’m seeing right now on Codeacademy’s top level page in lynx:

JavaScript

   JavaScript is an essential web technology, adding interactivity to the
   structure and style of <a
   href="https://discuss.codecademy.com/c/web">HTML and CSS</a>. This
   forum category covers the Codecademy courses <a
   href="https://www.codecademy.com/learn/learn-javascript?utm_source=foru
   ms_to_main&amp;utm_medium=category_description">Learn JavaScript</a>
   and <a
   href="https://www.codecademy.com/learn/javascript?utm_source=forums_to_
   main&amp;utm_medium=category_description">JavaScript</a>.

Visiting the page in Firefox, those links are actually links, not escaped HTML.This view of the page won’t be seen by most humans, but will be seen by at least some bots. The escaped html is visible in ‘view source’ in Firefox (long line wrapped)

<span itemprop='description'>JavaScript is an essential web technology, adding
interactivity to the structure and style of &lt;a
href=&quot;https://discuss.codecademy.com/c/web&quot;&gt;HTML and
CSS&lt;/a&gt;.  This forum category covers the Codecademy courses &lt;a
href=&quot;https://www.codecademy.com/learn/learn-javascript?utm_source=forums_to_main&amp;amp;utm_medium=category_description&quot;&gt;Learn
JavaScript&lt;/a&gt; and &lt;a
href=&quot;https://www.codecademy.com/learn/javascript?utm_source=forums_to_main&amp;amp;utm_medium=category_description&quot;&gt;JavaScript&lt;/a&gt;.</span>

(Separately, I’m wondering what the syntax highlighter thinks it is doing here. I’ve specified “html” on the code fence.)

1 Like

This might be a good one for you to try to fix locally, under our current arrangement?

2 Likes

Yes, I can start looking at it later tonight.

Changing c.description to c.description.html_safe here fixes it by letting html through

https://github.com/discourse/discourse/blob/master/app/views/categories/index.html.erb#L9

Or changing c.description to c.description_text fixes it by making it plain text. I’m guessing the .html_safe version is the preferred one, but I can give you a pull request with either.

1 Like

html_safe is fine here :wink:

1 Like

https://github.com/discourse/discourse/pull/4989

1 Like