Post with CDATA block produces invalid RSS feed?

I’m using Zapier to automatically post new topics on a Facebook page and I noticed that for last few days, the feeds (i.e. /latest.rss) aren’t posted anymore. The error that Zapier throws me is this:

The last error message was "mismatched tag: line 546, column 2".

What’s on line 546? This:

542| <script type="text/javascript">
543| <!--//--><![CDATA[//><!--
544|     !function(a,b){"use strict";function c(){if(!e){e=!0;var a,c,d,f,g=-1!==navigator.appVersion.indexOf("MSIE 10"),h=!!navigator.userAgent.match(/Trident.*rv:11\./),i=b.querySelectorAll("iframe.wp-embedded-content");for(c=0;c<i.length;c++)if(d=i[c],!d.getAttribute("data-secret")){if(f=Math.random().toString(36).substr(2,10),d.src+="#?secret="+f,d.setAttribute("data-secret",f),g||h)a=d.cloneNode(!0),a.removeAttribute("security"),d.parentNode.replaceChild(a,d)}else;}}var d=!1,e=!1;if(b.querySelector)if(a.addEventListener)d=!0;if(a.wp=a.wp||{},!a.wp.receiveEmbedMessage)if(a.wp.receiveEmbedMessage=function(c){var d=c.data;if(d.secret||d.message||d.value)if(!/[^a-zA-Z0-9]/.test(d.secret)){var e,f,g,h,i,j=b.querySelectorAll('iframe[data-secret="'+d.secret+'"]'),k=b.querySelectorAll('blockquote[data-secret="'+d.secret+'"]');for(e=0;e<k.length;e++)k[e].style.display="none";for(e=0;e<j.length;e++)if(f=j[e],c.source===f.contentWindow){if(f.removeAttribute("style"),"height"===d.message){if(g=parseInt(d.value,10),g>1e3)g=1e3;else if(200>~~g)g=200;f.height=g}if("link"===d.message)if(h=b.createElement("a"),i=b.createElement("a"),h.href=f.getAttribute("src"),i.href=d.value,i.host===h.host)if(b.activeElement===f)a.top.location.href=d.value}else;}},d)a.addEventListener("message",a.wp.receiveEmbedMessage,!1),b.addEventListener("DOMContentLoaded",c,!1),a.addEventListener("load",c,!1)}(window,document);
545| //--><!]]>
546| </script><iframe sandbox="allow-scripts" security="restricted" src="https://blog.jetbrains.com/dotnet/2016/11/21/jetbrains-rider-public-preview/embed/" width="600" height="338" title="Embedded WordPress Post" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"></iframe>

So I assume things are not playing as nice as they should.

Can we get a stripped down version of the RSS feed?

Thanks!

Hmm, can you check into the RSS feed issue on meta @techapj and see if it repros? maybe use a feed validator on a few different pages to make sure everything is OK on latest.

I checked RSS feed for meta’s latest page and different category pages and they all are valid. (validated via http://www.feedvalidator.org/)

2 Likes

Sounds like this is specific to some post on your site then, @iamntz

Any ideas on how i could debug this?

http://www.feedvalidator.org/check.cgi?url=https%3A%2F%2Fdevforum.ro%2Flatest.rss

Yes. it is because the post has <![CDATA[ … ]]> in it.
Use CDATA in RSS Feed to add HTML and links | Amit Tech Lab - PHP, AJAX, CakePHP,WordPress, Symfony, Drupal, codeIgnitier

This results in something like

<description><![CDATA[ … <![CDATA[ … ]]> … ]]></description> 

The validator / parser doesn’t do nested CDATA well.

4 Likes

Alright, but what can I do about this? There is anything in the admin area that I didn’t saw? It’s a bug?

Thanks!

Is nested CDATA valid? Perhaps it’s a validator bug.

I can think of a few things:

  1. Don’t use nested CDATA.

  2. Determine whether nested CDATA is valid, and if it is, report it as a problem to the validator.

I wouldn’t call it a bug, but more an edge case.

No

https://www.w3.org/TR/REC-xml/#sec-cdata-sect

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using “&lt;” and “&amp;”. CDATA sections cannot nest.

The problem is, after the first <![CDATA[ everything up to the ]]> is considered to be “character data” and not XML tags. Once a ]]> is encountered, it goes back to needing to be well-formed valid XML.

My first thought was to change the < and / or ] to an entity. But after seeing the “and cannot” I guess that won’t work.

Maybe “breaking” the inner CDEnd by inserting zwsp characters would work?

EDIT

The RSS for this topic is valid and uses entities (&lt; and &gt;), so it looks like my first thought should work.
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fmeta.discourse.org%2Ft%2Fpost-with-cdata-block-produces-invalid-rss-feed%2F53362.rss

1 Like

Let’s try to embed the faulty url :slight_smile:

PS: not valid anymore :blush:

Well, yes.

It has already been established that it can be broken.

I’m more interested in your ideas about how to fix things so it won’t break.

I can see few approaches:

  1. Allow only a handful of html tags on feeds (i.e. formatting tags, links, images)
  • Block potential unsafe tags in feeds (iframe, object),
  • Show only an excerpt (i.e. only a part of the topic, fully stripped of tags)
3 Likes

Are we sure this embed is correct @techapj / @zogstrip? It looks like a variable height embed to me which we shouldn’t even allow? The height of it is definitely incorrect onscreen in MS Edge.

1 Like

Ah, yes, the height is another thing that is not like it should. Once, i had link which had a ~1000px height empty iframe, but since i wasn’t able to replicate, i ignore it.

So probably that’s related somehow.

I remember fixing this issue via:

It was previously reported here:

This is a recent regression.

cc @zogstrip

3 Likes

I’ll fix it. I did try it on a handful of Wordpress and they worked without it… Guess I was lucky. I’ll add it back :pencil:

4 Likes

Will be fixed soon :strawberry:

https://github.com/discourse/onebox/commit/a4708d78382a206f5ff9d39cf5cb76b812d7fbb0

5 Likes

Yup, this works fine now. Thanks!

4 Likes