Disable headers in replies like <h1> and <h2>

Hi,

I like the fact that Discourse allows me to set strong or italic text styles in replies.

What I don’t like is that it allows headers. It’s bad for readability, SEO and confuses spiders:

H1 header

H2 header

Or check out this real live example: 02.28.2018-14.41.58

So: is it possible to disable headers; so they get ignored like below example at Wordpress .org?

2018-02-28_14-39-27

1 Like

I think that headers are very useful and that people should use them more. That said, I think I agree that h1 an h2 headers don’t make sense in posts, whether they confuse spiders is beyond my scope of expertise.

It does make sense to me that the first level of header in a post should be an h3, so something that converted header_level+2 automatically would be a good idea.

1 Like

Hi @pfaffman, the problem lies in the fact that users can accidently or mis-use headers and this confuses spiders when they scan the global contents of a page by checking headers (h1 and h2). Especcialy when you use the Wordpress plugin which shows replies below a Wordpress post. It can get very confusing for Googlebot when there are multiple headers on a page.

I found out Discourse also sets headers when:

you use a - symbol below a paragraph like this.

you add a # symbol before a paragraph like this

These headers confuse spiders like Googlebot. I hope this can be disabled soon in Discourse!

I can’t speak to the Google confusion, though it sounds odd to me that Google would have an issue with headers on a webpage.

Do you have an issue with users abusing headers? I can’t say I’ve seen an issue with this on any of the sites I frequent.

Check out: https://www.hobo-web.co.uk/headers/

I think this is why other forums like Wordpress.org disable usage of headers. I run a technical forum with Discourse where symbols like # and - are used a lot and thus results in undesired header formats.

QUOTE :We do use H tags to understand the structure of the text on a page better, John Mueller, Google

QUOTE : “ Google looks at a lot of different things … even things that are like really highlighted like h1 tags and stuff like that.” Matt Cutts, Google

Skimming that article makes me think it’s a great thing to be using H tags and you should do it more.

This sounds like the actual problem here. Maybe a plugin to disable those pieces of Markdown on your site would make more sense? It would be a hard sell to make this a core change.

2 Likes

My understanding of it has always been that headings are contextual based on the sectioned content. So for example, a post contained within an article tag should restart heading prioritization from H1.

From HTML Standard

The first element of heading content in an element of sectioning content represents the heading for that section. Subsequent headings of equal or higher rank start new (implied) sections, headings of lower rank start implied subsections that are part of the previous one. In both cases, the element represents the heading of the implied section.

I’d be surprised if Google didn’t account for this. Is there any evidence we can see that SEO has been confused by this?

6 Likes

Hi @awesomerobot

Correct. And Discourse users can mis-use headers when they (accidently) use # and - symbols in their replies and thus:

Worsen readability

Confuse spiders like Googlebot with a header like this

In a SEO optimized environment, forum users should not influence headers usuage. It’s something only a page admin should apply. I think this is why forums like Wordpress. org do not output headers at all, even when HTML codes like h1 is used.

Take a look at a real-life example of my technical Discourse forum where these characters are used regulary: https://www.screencast.com/t/BTyGodbB1

1 Like

For readability you could add some custom CSS to disable formatting. You could even disable it for everyone who’s not in a specific group, for example:

.topic-post:not(.group-team) .cooked {
    H1,H2,H3,H4,H5,H6 {
        font-size: 1em;
        font-weight: normal;
    }
}

I don’t think it’s a bad idea to potentially restrict heading use to specific users, but there’s no feature planned to support that at the moment. We wouldn’t want to remove headings entirely for everyone using Discourse, because I believe removing Hs might hurt a community’s search-ability (for example: we have a lot of tutorials here that use headings appropriately… and many aren’t written by staff/admins…).

One thing you could do is add the # to the blocked words feature using a regex like #{1,}\s (I believe this would work, but I’m not amazing at regex). That would block all # with a space after them (markdown headings).

Of course, that only works on new content. For old content, maybe you can strip out # using a rake task?

4 Likes

Right. And, besides, h1 and h2 headers just look bad in a post.

I don’t pretend to know anything about SSO (but https://www.literatecomputing.com/product/discourse-install/ has crept onto the first page for “discourse install”–but not “install discourse”). It does make sense that spiders would treat H1 headers as if they should be H1 headers and that there shouldn’t be one of them.

If it’s a technical forum those people should be able to learn markdown.

I never use # or ## in a post because it looks bad. I think it’d make sense to either demote H1 and H2 within a post to H3 or to demote all headers by 1 or 2 levels.

2 Likes

Keep in mind that there are some cases where H tags within posts make perfect sense.

In your case, @erikmolenaarnl, I would potentially write a plugin that edits the post text before it gets to the preview and cooks.

1 Like

Thanks! Writing Discourse plugins is not a strength of mine. Could someone help me out to get started or write the plugin for me (for payment of course).

And I would like to share the code afterwards with the Discourse community!

3 Likes

I do plugins quite a bit. Shoot me a quick email and we can work through the details. joe@joebuhlig.com

2 Likes

Hi @joebuhlig
I’ve sent you an e-mail and DM but received no response yet.
Could you please reply if you have some time so we can get started? Thanks :slight_smile:

Hello all,

The talented @joebuhlig wrote a great Discourse plugin which disables all HTML headers.

My brother @renem (of NetworkLessons.com) paid for this plugin and wants to share it with the Discourse community. If it can help more people out, please use it. It Rocks!

https://github.com/HA-Tech/discourse-sanitize-header-tag

And @joebuhlig thanks again for the great work!

6 Likes

Great! Is this for all posts or is the initial post in a topic excluded?