Applying Schema.org on the user profile page for improved author authority on Google

First off, Schema.org is very well applied throughout the forums!

I would like to know however if it could be possible to give an ‘author’ some more authority for Google search results by also applying Schema.org to the profile page and additionally adding the possibility to link social media accounts to that profile.

There’s a Schema.org ProfilePage type which can be found here: ProfilePage - Schema.org Type

Not all will be necessary, but here’s some example markup of how it could be implemented:

<script type="application/ld+json">
{
    "@context" : "http://schema.org",
    "@type" : "ProfilePage",
    "mainEntity" : {
        "@type" : "Person",
        "name" : "Jane Doe",
        "givenName" : "Jane",
        "familyName" : "Doe",
        "email" : "jdoe@examplelaw.com",
        "telephone" : "9195555555",
        "jobTitle" : "Partner",
        "image" : "https://www.examplelaw.com/wp-content/examplelaw/2018/02/jane-doe.jpg",
        "url" : "https://www.examplelaw.com/attorney/jane-doe/",
        "worksFor": {
            "@type": "Organization",
            "name": "Example Law Firm",
            "url": "https://www.examplelaw.com/",
            "address": {
                "@type": "PostalAddress",
                "addressLocality": "Raleigh",
                "addressRegion": "NC",
                "postalCode": "27604",
                "streetAddress": "100 Main Street, Suite 201",
                "addressCountry": "USA"
            }
        },
        "gender": "female",
        "alumniOf": [
            {
                "@type" : "CollegeOrUniversity",
                "name" : "University of North Carolina at Chapel Hill"
            },
            {
                "@type" : "CollegeOrUniversity",
                "name" : "University of North Carolina School of Law"
            }
        ],
        "memberOf": [
            "North Carolina State Bar",
            "Wake County Bar",
            "North Carolina Board Certified Family Law Specialist",
            "Certified Parenting Coordinator",
            "NCDRC Certified Family Financial Mediator"
        ],
        "award": [
            "North Carolina Super Lawyers, Rising Star 2018",
            "Business Leader Magazine, North Carolina Top Family Lawyer"
        ],
        "sameAs": [
            "https://www.facebook.com/JaneDoeAttorney/",
            "https://www.linkedin.com/in/jane-doe-attorney",
            "https://twitter.com/janedoeattorney"
        ]
    }
}
</script>

Where especially ‘sameAs’ would be a welcoming asset.

3 Likes

By default, the profile page is prohibited from indexing, or am I confusing something?

3 Likes

I think we relaxed that recently, profile pages are now indexed as I recall.

What specifically do you propose for the profile page? Just the code block you posted already in your first post @JoshuaH?

1 Like

I’m in favor of this change, BTW. This is likely going to be recommended soon as well for forums by Google. sameAs links and interactionStatistics in particular, but more information to assist in clustering like given there can be useful as well.

And that’s a good general structure recommendation (ProfilePage → mainEntity → Person). I wish there was an account type in schema.org but that’s been slow to find traction so we’re working with what we have.

3 Likes

The tricky thing though is that we don’t even allow indexing on user pages.

Generally we see them as low value pages to add to Google.

Stack Overflow and a few other places do allow indexing here.

I guess a lot is depending on our strategy around user pages, they are very possibly a spam and abuse vector with mixed in privacy concerns.

Potentially if we only allowed indexing to particular groups and we had an HTML view that was better than just a blank page with a user name. Then this would make sense to build as part of this effort.

I guess a big question is how much value would a forum get from having inbound google links that land on user profiles?

2 Likes

Fair question.

One of our problems in general is that we also use the web as our data source for a lot of things (in addition to our URL index). And there is a bunch of data that can be useful for freshness or goodness signals on those pages that isn’t anywhere else. From a forum’s point of view, it’s primarily just going to let us better understand the authors of the content in the forum, because it’s costly to stick all that information on every post. The hope would be that it would be that it would help promote the best authors and content on the site better and with more freshness. But you raise a good point that the ROI is less obvious than on discussion pages. I’ll raise that issue with others at Google.

2 Likes

Yes, this is absolutely the fundamental problem. The issue is how much of this data is suspect versus trusted.

Names, bios, and links to my home page are all fundamentally pretty suspect data. They are completely controlled by an end user without any form of validation. For instance, someone could simply use the name “Sundar Pichai” and include a link to Google. The last thing we want is for searches for the CEO of Google to direct users to this fake profile.

Over time, as a user engages with a forum, their data becomes less suspect. For example, if a user has been around for a year and posts frequently, we might trust their profile information is accurate.

Of course, solving the identity issue is somewhat intractable, and many have tried and failed.

Other data we could trust on a per-user basis could include:

  • List of badges a user has
  • Top topics and posts they posted
  • Number of likes and reactions they’ve received
  • Age of the account
  • Public groups they are members of

From an indexing perspective, site operators would probably prefer it if Google prioritized indexing content from more trusted users first. However, drawing a line on where to begin is nuanced.

Scanning the sitemap though is fundamentally so efficient that this might seem like a micro-optimization taken too far.

2 Likes

We handle this problem already for much larger sets of profiles than Discourse forums (like social media profiles). Not that things don’t slip through the cracks, but we’ve been working on the issue of people claiming they are celebrities for years and have a lot of signals for these things. Some of the signals you mentioned are the types of things we look for. And that’s precisely why when they aren’t indexed or we fail to extract them, we don’t know as well which posts and people to prioritize. We can try based on the content authorship, but it’s usually lacking some signals.

2 Likes

None. Unless that person is searched. And in that situation every social media accounts will overdrive user accounts at Discourse, I reckon.

As of last month Google added support for DiscussionForumPosting schema (which Discourse already does well) and ProfilePage schema:

They currently recommend linking discussion posts to an author.url page which is:

A link to a web page that uniquely identifies the author of the post, most likely a profile page of the forum. We recommend marking up that page using profile page structured data.

As far as I can tell, having indexed ProfilePages linked to discussion posts is the only way for forums to rank in Google’s new Perspectives search. All of this is incompatible with noindex’d profile pages.


Given this latest news, would you reconsider indexing profile pages and adding the schema from the OP?

Maybe it will be a good option to make a setting which will allow to make profile pages as indexable by search engine, so every community administrator will decide by themselves do they want to allow profiles to be open for search engines or not.

In order to prevent spammers to use such profile pages and discourse forum/community as their link building platform, it might be a good idea to add another option: forum profiles allowed to be indexed (can be restricted via meta tag) for users who meet certain criteria: level of trust (such as in setting related to wiki posts), or being a member of some group of verified experts for example. Some communities might have real experts such as doctors, lawyers etc who will be more motivated to participate in conversations if they can have an indexable profile page. They might be experts in their field but they don’t know anything about SEO, web, etc. Let’s say they don’t want to make some blog, website or something like that. Short link to their profile at some discourse community might be a good option for them.

Although profile pages are not exactly a ranking factor but it helps search engines to understand is this particular article or forum post is trustworthy:

1 Like

Should Discourse serve an indexable profile page?

There are different opinions around the pros/cons of indexable profile pages.
I summarize some of them to make a point on adding a minimalistic indexable profile page.


  1. Google can more easily process the forum content with referenced indexable profile pages:
  1. For “disambiguation” Google really needs at least any kind of reference to the author - even if it is not indexable:
  1. There is a schema draft for a minimal profile page:
  1. No big additional load on Discourse with indexable profile pages:

Google is already crawling the profile URLs, gets a response with HTTP-header X-Robots-Tag: noindex and then throws the result away.

Screenshot Google Search Console --> Excluded by 'noindex' tag (click to open)

By serving a minimalistic profile page Google can at least use the result somehow.


My conclusion

Add a cralwer_view for profile pages which shows just a minimalistic schema markup – no additional information needed.
The minimalistic schema markup should resemble exactly the data which is already presented in schema data on every post as attribute author:

<html>
  <body itemtype="https://schema.org/ProfilePage" itemscope>
    <span itemprop="mainEntity" itemtype="http://schema.org/Person" itemscope>
      <a itemprop="url" href='https://meta.discourse.org/u/{user_name}'>
        <span itemprop='name'>{user_name}</span>
      </a>
    </span>
  </body>
</html>

This is a valid “Profile page” - see this example on search.google.com/test/rich-results:

Then the profile page URLs can become indexable again.

2 Likes