Post with CDATA block produces invalid RSS feed?

I’m using Zapier to automatically post new topics on a Facebook page and I noticed that for last few days, the feeds (i.e. /latest.rss) aren’t posted anymore. The error that Zapier throws me is this:

The last error message was "mismatched tag: line 546, column 2".

What’s on line 546? This:

542| <script type="text/javascript">
543| <!--//--><![CDATA[//><!--
544|     !function(a,b){"use strict";function c(){if(!e){e=!0;var a,c,d,f,g=-1!==navigator.appVersion.indexOf("MSIE 10"),h=!!navigator.userAgent.match(/Trident.*rv:11\./),i=b.querySelectorAll("iframe.wp-embedded-content");for(c=0;c<i.length;c++)if(d=i[c],!d.getAttribute("data-secret")){if(f=Math.random().toString(36).substr(2,10),d.src+="#?secret="+f,d.setAttribute("data-secret",f),g||h)a=d.cloneNode(!0),a.removeAttribute("security"),d.parentNode.replaceChild(a,d)}else;}}var d=!1,e=!1;if(b.querySelector)if(a.addEventListener)d=!0;if(a.wp=a.wp||{},!a.wp.receiveEmbedMessage)if(a.wp.receiveEmbedMessage=function(c){var d=c.data;if(d.secret||d.message||d.value)if(!/[^a-zA-Z0-9]/.test(d.secret)){var e,f,g,h,i,j=b.querySelectorAll('iframe[data-secret="'+d.secret+'"]'),k=b.querySelectorAll('blockquote[data-secret="'+d.secret+'"]');for(e=0;e<k.length;e++)k[e].style.display="none";for(e=0;e<j.length;e++)if(f=j[e],c.source===f.contentWindow){if(f.removeAttribute("style"),"height"===d.message){if(g=parseInt(d.value,10),g>1e3)g=1e3;else if(200>~~g)g=200;f.height=g}if("link"===d.message)if(h=b.createElement("a"),i=b.createElement("a"),h.href=f.getAttribute("src"),i.href=d.value,i.host===h.host)if(b.activeElement===f)a.top.location.href=d.value}else;}},d)a.addEventListener("message",a.wp.receiveEmbedMessage,!1),b.addEventListener("DOMContentLoaded",c,!1),a.addEventListener("load",c,!1)}(window,document);
545| //--><!]]>
546| </script><iframe sandbox="allow-scripts" security="restricted" src="https://blog.jetbrains.com/dotnet/2016/11/21/jetbrains-rider-public-preview/embed/" width="600" height="338" title="Embedded WordPress Post" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"></iframe>

So I assume things are not playing as nice as they should.

Can we get a stripped down version of the RSS feed?

Thanks!

Hmm, can you check into the RSS feed issue on meta @techapj and see if it repros? maybe use a feed validator on a few different pages to make sure everything is OK on latest.

I checked RSS feed for meta’s latest page and different category pages and they all are valid. (validated via http://www.feedvalidator.org/)

إعجابَين (2)

Sounds like this is specific to some post on your site then, @iamntz

Any ideas on how i could debug this?

http://www.feedvalidator.org/check.cgi?url=https%3A%2F%2Fdevforum.ro%2Flatest.rss

Yes. it is because the post has <![CDATA[ … ]]> in it.
Use CDATA in RSS Feed to add HTML and links | Amit Tech Lab - PHP, AJAX, CakePHP,WordPress, Symfony, Drupal, codeIgnitier

This results in something like

<description><![CDATA[ … <![CDATA[ … ]]> … ]]></description> 

The validator / parser doesn’t do nested CDATA well.

4 إعجابات

Alright, but what can I do about this? There is anything in the admin area that I didn’t saw? It’s a bug?

Thanks!

Is nested CDATA valid? Perhaps it’s a validator bug.

I can think of a few things:

  1. Don’t use nested CDATA.

  2. Determine whether nested CDATA is valid, and if it is, report it as a problem to the validator.

I wouldn’t call it a bug, but more an edge case.

No

https://www.w3.org/TR/REC-xml/#sec-cdata-sect

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using “&lt;” and “&amp;”. CDATA sections cannot nest.

The problem is, after the first <![CDATA[ everything up to the ]]> is considered to be “character data” and not XML tags. Once a ]]> is encountered, it goes back to needing to be well-formed valid XML.

My first thought was to change the < and / or ] to an entity. But after seeing the “and cannot” I guess that won’t work.

Maybe “breaking” the inner CDEnd by inserting zwsp characters would work?

EDIT

The RSS for this topic is valid and uses entities (&lt; and &gt;), so it looks like my first thought should work.
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fmeta.discourse.org%2Ft%2Fpost-with-cdata-block-produces-invalid-rss-feed%2F53362.rss

إعجاب واحد (1)

Let’s try to embed the faulty url :slight_smile:

PS: not valid anymore :blush:

Well, yes.

It has already been established that it can be broken.

I’m more interested in your ideas about how to fix things so it won’t break.

I can see few approaches:

  1. Allow only a handful of html tags on feeds (i.e. formatting tags, links, images)
  • Block potential unsafe tags in feeds (iframe, object),
  • Show only an excerpt (i.e. only a part of the topic, fully stripped of tags)
3 إعجابات

Are we sure this embed is correct @techapj / @zogstrip? It looks like a variable height embed to me which we shouldn’t even allow? The height of it is definitely incorrect onscreen in MS Edge.

إعجاب واحد (1)

Ah, yes, the height is another thing that is not like it should. Once, i had link which had a ~1000px height empty iframe, but since i wasn’t able to replicate, i ignore it.

So probably that’s related somehow.

I remember fixing this issue via:

It was previously reported here:

This is a recent regression.

cc @zogstrip

3 إعجابات

I’ll fix it. I did try it on a handful of Wordpress and they worked without it… Guess I was lucky. I’ll add it back :pencil:

4 إعجابات

Will be fixed soon :strawberry:

https://github.com/discourse/onebox/commit/a4708d78382a206f5ff9d39cf5cb76b812d7fbb0

5 إعجابات

Yup, this works fine now. Thanks!

4 إعجابات