I think that headers are very useful and that people should use them more. That said, I think I agree that h1 an h2 headers don’t make sense in posts, whether they confuse spiders is beyond my scope of expertise.
It does make sense to me that the first level of header in a post should be an h3, so something that converted header_level+2 automatically would be a good idea.
Hi @pfaffman, the problem lies in the fact that users can accidently or mis-use headers and this confuses spiders when they scan the global contents of a page by checking headers (h1 and h2). Especcialy when you use the Wordpress plugin which shows replies below a Wordpress post. It can get very confusing for Googlebot when there are multiple headers on a page.
I think this is why other forums like Wordpress.org disable usage of headers. I run a technical forum with Discourse where symbols like # and - are used a lot and thus results in undesired header formats.
QUOTE: ‘ We do use H tags to understand the structure of the text on a page better, John Mueller, Google
QUOTE: “ Google looks at a lot of different things … even things that are like really highlighted like h1 tags and stuff like that.” Matt Cutts, Google
Skimming that article makes me think it’s a great thing to be using H tags and you should do it more.
This sounds like the actual problem here. Maybe a plugin to disable those pieces of Markdown on your site would make more sense? It would be a hard sell to make this a core change.
My understanding of it has always been that headings are contextual based on the sectioned content. So for example, a post contained within an article tag should restart heading prioritization from H1.
The first element of heading content in an element of sectioning contentrepresents the heading for that section. Subsequent headings of equal or higher rank start new (implied) sections, headings of lower rank start implied subsections that are part of the previous one. In both cases, the element represents the heading of the implied section.
I’d be surprised if Google didn’t account for this. Is there any evidence we can see that SEO has been confused by this?
Correct. And Discourse users can mis-use headers when they (accidently) use # and - symbols in their replies and thus:
Worsen readability
Confuse spiders like Googlebot with a header like this
In a SEO optimized environment, forum users should not influence headers usuage. It’s something only a page admin should apply. I think this is why forums like Wordpress. org do not output headers at all, even when HTML codes like h1 is used.
For readability you could add some custom CSS to disable formatting. You could even disable it for everyone who’s not in a specific group, for example:
I don’t think it’s a bad idea to potentially restrict heading use to specific users, but there’s no feature planned to support that at the moment. We wouldn’t want to remove headings entirely for everyone using Discourse, because I believe removing Hs might hurt a community’s search-ability (for example: we have a lot of tutorials here that use headings appropriately… and many aren’t written by staff/admins…).
One thing you could do is add the # to the blocked words feature using a regex like #{1,}\s (I believe this would work, but I’m not amazing at regex). That would block all # with a space after them (markdown headings).
Of course, that only works on new content. For old content, maybe you can strip out # using a rake task?
Right. And, besides, h1 and h2 headers just look bad in a post.
I don’t pretend to know anything about SSO (but Redirecting… has crept onto the first page for “discourse install”–but not “install discourse”). It does make sense that spiders would treat H1 headers as if they should be H1 headers and that there shouldn’t be one of them.
If it’s a technical forum those people should be able to learn markdown.
I never use # or ## in a post because it looks bad. I think it’d make sense to either demote H1 and H2 within a post to H3 or to demote all headers by 1 or 2 levels.
Thanks! Writing Discourse plugins is not a strength of mine. Could someone help me out to get started or write the plugin for me (for payment of course).
And I would like to share the code afterwards with the Discourse community!
The talented @joebuhlig wrote a great Discourse plugin which disables all HTML headers.
My brother @renem (of NetworkLessons.com) paid for this plugin and wants to share it with the Discourse community. If it can help more people out, please use it. It Rocks!