RSS subscription broken by post content

jeff5-really · October 16, 2023, 4:42pm

On the Python Discourse I noticed that my RSS subscription to the users (renamed “Help”) category had stopped working. On trying to re-establish it, the subscription https://discuss.python.org/c/users/7.rss results in invalid content that my reader (Thunderbird) will not load. It fails validation at W3C:

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fdiscuss.python.org%2Fc%2Fusers%2F7.rss

Since that check fails, I assume I’m not the only one affected.

The problem seems to be an unexpected character in the post https://discuss.python.org/t/beginner-help-with-concatenating-arrays/36226. In the feed, the offending sub-string comes out as b'N \x02x KSQT' (two occurrences).

It’s not that user’s fault, of course, but Discourse’s for letting it through, and the long-term fix lies with you.

An admin there (or at least a CPython core dev) suggested I report it here.

sam · October 17, 2023, 3:50am

This is a such an odd one:

PrettyText.format_for_email(p.cooked, p)
=> "<p>Hello, I’m currently trying to follow a machine learning pipeline described by a paper. Essentially, I need to create an input matrix which is shaped N x KSDT sized. The paper describes this as: “Here k, ks, kd, and ksd are labels and not indices, and all terms are understood to be matrices of the same N x KSQT size, so e.g. Xk is not an N x K sized matrix, but the full-size N x KSQT matrix with N x k unique values replicated KSQ times”.</p>\n<p>Right now, I have three following np.arrays:<br>\nbias_block: (348, 2, 151), bias_contrast: (348, 5, 151), and bias_decision: (348, 2, 151).<br>\nMy understanding is that in order to combine these three arrays, I would need a final size of (348, 20, 20, 20, 151). However, I’m really struggling on how to combine these arrays. Could someone please help with this, thanks a lot.</p>"

I am not seeing what is wrong with that string … the N x KSDT does not appear to have anything hiding there.

(note the post has now popped out of latest, so rss is back and working as a side effect, but I certainly would like to fix this.

I am assuming this is the line where this originates from:

github.com

discourse/discourse/blob/542f77181a47df8aa0f909f69814406491d08c5e/app/views/posts/latest.rss.erb#L14-L14


      
          <description><![CDATA[ <%= PrettyText.format_for_email(post.cooked, post).html_safe %> ]]></description>

simon · October 17, 2023, 4:47am

I looked at the post earlier today. There was a unicode hex code in it that was something like ☐ (&#x2610). That’s not the exact code though. It was showing up in the post’s raw content this morning (https://discuss.python.org/posts/121311.json). Seems to have been edited since then.

Falco · October 17, 2023, 1:04pm

The faulty character is � or

jeff5-really · October 17, 2023, 5:35pm

The first occurrence is ok, but the second and third contain an 0x02 byte (when I save from this URL using Firefox and read the file as bytes using Python), as in my first post. validator.w3.org gave me enough context to locate the first 0x02 in the line.

U+002610 is just the box symbol that something is replacing it with (but not in the RSS).

I asked for the post to be repaired as I didn’t see me getting my subscription working without. I can send you my saved bytes if it would help.

Falco · October 19, 2023, 1:28am

Per the RSS 2.0 spec, the feed must be XML 1.0 compliant. And per XML 1.0 spec, there are several control characters that are invalid.

The PR below is a first try to address this:

Falco · October 21, 2023, 11:00am

This topic was automatically closed after 39 hours. New replies are no longer allowed.

Topic		Replies	Views
Parsing RSS feed missing quotes + apostrophes Bug	9	1181	May 31, 2018
Post with CDATA block produces invalid RSS feed? Support	18	4062	November 28, 2016
RSS is not valid Support	9	1629	January 24, 2015
XML parsing error in posts.rss Bug	2	1072	May 28, 2016
Problem in latest.rss / Not a Valid Feed, it says! Support	32	4062	May 21, 2023

RSS subscription broken by post content

Related topics