RSS subscription broken by post content

On the Python Discourse I noticed that my RSS subscription to the users (renamed “Help”) category had stopped working. On trying to re-establish it, the subscription https://discuss.python.org/c/users/7.rss results in invalid content that my reader (Thunderbird) will not load. It fails validation at W3C:

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fdiscuss.python.org%2Fc%2Fusers%2F7.rss

Since that check fails, I assume I’m not the only one affected.

The problem seems to be an unexpected character in the post https://discuss.python.org/t/beginner-help-with-concatenating-arrays/36226. In the feed, the offending sub-string comes out as b'N \x02x KSQT' (two occurrences).

It’s not that user’s fault, of course, but Discourse’s for letting it through, and the long-term fix lies with you.

An admin there (or at least a CPython core dev) suggested I report it here.

3 Likes

This is a such an odd one:

PrettyText.format_for_email(p.cooked, p)
=> "<p>Hello, I’m currently trying to follow a machine learning pipeline described by a paper. Essentially, I need to create an input matrix which is shaped N x KSDT sized. The paper describes this as: “Here k, ks, kd, and ksd are labels and not indices, and all terms are understood to be matrices of the same N x KSQT size, so e.g. Xk is not an N x K sized matrix, but the full-size N x KSQT matrix with N x k unique values replicated KSQ times”.</p>\n<p>Right now, I have three following np.arrays:<br>\nbias_block: (348, 2, 151), bias_contrast: (348, 5, 151), and bias_decision: (348, 2, 151).<br>\nMy understanding is that in order to combine these three arrays, I would need a final size of (348, 20, 20, 20, 151). However, I’m really struggling on how to combine these arrays. Could someone please help with this, thanks a lot.</p>"

I am not seeing what is wrong with that string … the N x KSDT does not appear to have anything hiding there.

(note the post has now popped out of latest, so rss is back and working as a side effect, but I certainly would like to fix this.

I am assuming this is the line where this originates from:

1 Like

I looked at the post earlier today. There was a unicode hex code in it that was something like ☐ (&#x2610). That’s not the exact code though. It was showing up in the post’s raw content this morning (https://discuss.python.org/posts/121311.json). Seems to have been edited since then.

4 Likes

The faulty character is � or 

3 Likes

The first occurrence is ok, but the second and third contain an 0x02 byte (when I save from this URL using Firefox and read the file as bytes using Python), as in my first post. validator.w3.org gave me enough context to locate the first 0x02 in the line.

U+002610 is just the box symbol that something is replacing it with (but not in the RSS).

I asked for the post to be repaired as I didn’t see me getting my subscription working without. I can send you my saved bytes if it would help.

1 Like

Per the RSS 2.0 spec, the feed must be XML 1.0 compliant. And per XML 1.0 spec, there are several control characters that are invalid.

The PR below is a first try to address this:

3 Likes

This topic was automatically closed after 39 hours. New replies are no longer allowed.