Error in the Schema.org data for DiscussionForumPosting?

I noticed a bug in the Schema.org data for DiscussionForumPosting.

When I run a random Discourse forum topic through the validator, it shows the @id field with URLs that don’t exist.

Here’s an example with a trailing path of /post_2 (it’s a 404 error):

I think that those @id fields are supposed to be working URLs, because W3.org says:

To be able to externally reference nodes in a graph, it is important that nodes have an identifier. IRIs are a fundamental concept of Linked Data, for nodes to be truly linked, dereferencing the identifier should result in a representation of that node. This may allow an application to retrieve further information about a node.

1 Like

I wonder if this is an issue with how validator is displaying the id. As far as I can tell the id is pulled from the markup and isn’t something we’re defining ourselves, for example:

<div id='post_1' itemscope itemtype='http://schema.org/DiscussionForumPosting' class='topic-body crawler-post'>

id='post_1 being the @id

If you click that id section in the validator it correctly highlights the post with the matching id… so it seems the validator can properly identify it.

I notice this behavior on other sites with @id values, for example in the schema data for this stackoverflow.com question:

This has the same issue, https://stackoverflow.com/questions/7227202/answer-38775925 is not actually a valid URL, it suffers from the same error where it should be a # instead of a / https://stackoverflow.com/questions/7227202#answer-38775925.

Are there any indications that this is causing a problem with how this data is used in practice anywhere?

1 Like

That’s interesting. I didn’t think to check the HTML source and just assumed that it was JSON-LD.

Google uses schema data, but I’m not sure if they use that specific one. The schema.org docs aren’t written very clearly.

It looks like Discourse is putting multiple DiscussionForumPostings on each topic, but the example in the docs looks like DiscussionForumPosting might be meant to refer to just the main topic and not the comments? The docs list a comment field with a Comment (singular) though the description is worded in plural.

I just looked at how Invison does it and it uses JSON-LD, putting Comment objects in a comment field. It looks like it’s a lot of extra text to send to the browser.

I don’t know what the answer is, but I’ll try to research it more later.

1 Like

Is this relevant?

3 Likes

I happen to be lurking on this forum which is convenient. I own the Google code that parses that.

The linked thread is a good answer to the comment tangent. I’ll address the rest here.

It’s essentially non standard to interpret HTML id attributes as node IDs. It was done in the very beginning of Google’s microdata parsing probably for fuzzy reasons. You are supposed to use itemid if you want to do that explicitly. I hope to remove that hack someday but it’s hard to pull something like that out without losses.

Secondly, IRIs do not have to be dereferencable. That’s a suggestion from W3C but many IRIs are not and Google definitely doesn’t require it.

This is only a problem if it causes nodes in the structured data to inadvertently merge like if you used an itemid of the same value somewhere else in the HTML. Otherwise it’s just a weirdness that can be ignored.

Oh, and please don’t switch to JSON-LD. Honestly that’s preferred for text heavy markup like forums. Having to duplicate the textual contents is silly. It’s just easier to author which is why we’ve been pushing it.

9 Likes

Thanks for lurking @rrlevering! It sounds like it’s safe to close this issue, and we’ll be updating the topic/post schema in Different schema type for Topics and Posts

5 Likes

This topic was automatically closed after 2 days. New replies are no longer allowed.