3

Trying to understand Google's guidelines for paywalled content.

My site work like this:

  • Users without a paid subscription will get a few free reads per week. Some js determines if they get to read the article or if we should trigger a paywall.
  • The contents of an article page is located in a .paid-content element. When the paywall triggers it'll remove that element and replace it with a .paywall element that says "Please buy a subscription to continue reading our site".

Currently my JSON-LD looks like this

"hasPart":[  
      {  
         "@type":"WebPageElement",
         "isAccessibleForFree":false,
         "cssSelector":".paid-content"
      },
      {  
         "@type":"WebPageElement",
         "isAccessibleForFree":false,
         "cssSelector":".paywall"
      }
   ],
   "isAccessibleForFree":false

Questions:

  1. Should .paywall even be listed in the hasParts array? This element just says "Please buy a subscription". It doesn't contain any text which is hidden from free users.

  2. In my case, only one of these two elements will exist on the page at any given time. Is that ok? Or will the google crawler think it's a problem if it's unable to find all of the element specified in the hasPart array?

Drkawashima
  • 8,837
  • 5
  • 41
  • 52

1 Answers1

3

Short Answer:

For Google, hasPart > cssSelector is for indicating visually hidden content behind a paywall. In your example you're either completely removing content or showing all content publicly so the schema is irrelevant and unnecessary in either case.

.paywall won't be necessary because cssSelector should reference the class of an element wrapping paywalled content, not just a paywall message (which is visible to all users).

.paid-content is wrapping content that is visible to all users, which would make that schema unnecessary as well since you should only target content visually hidden behind a paywall (see below and their second example).

I'm not certain how Google would react to this schema markup not matching the DOM, but I think it might be ignored in this case since they're looking for something very specific. Having a page with no content indexed is the bigger problem here.

Long Answer:

The point of having this paywall schema (from Googles standpoint) stems down to one major reason:

Publishers should enclose paywalled content with structured data to help Google differentiate paywalled content from the practice of cloaking, where the content served to Googlebot is different from the content served to users.

Cloaking (i.e. hiding content on a page for SEO gains) has been a big strategy used by "black hatters" for many years now. Google will penalize this practice where they can (like BMW back in 2006) and have certainly done plenty of work on their algorithms to catch this stuff automatically. Problem is - now we have paywall sites like yours, which "hide content" but for different (and less dubious) reasons.

You are not visually hiding your content though, instead you are stripping the content off of the page. The problem with this approach is that you risk Google bots also hitting the paywall and not indexing the page properly - since the content is just not there. Even if you are stripping content with JavaScript it's a risk.

That's why typical paywall sites will cover or hide content behind a CSS overlay coupled with overflow:hidden on the body. That approach probably triggers a Google red flag for cloaking and is why they're now asking people to use this (I'm just assuming that last sentence).

So taking that into consideration and looking at the Google examples from the link you provided, the cssSelector is just to say: "this content isn't some cloaking/blackhat trick, it's just paywalled, so let's still index it."

Bottom line for you is that the schema in your example doesn't matter... because either you're showing users all of the content and have nothing to prove to Google, or you're displaying a page with no content and there's no cloaking issue for Google to care about.

So if this is your thing, the rule of thumb is:

  1. Don't remove content from the page (even via JS) if you want it indexed
  2. If you paywall content, hide it and help Google by using their schema instructions

Other loosely related notes:

  1. If you take cssSelector out of the structured data tool, it still validates, but I don’t always trust the tool to be right
  2. Google hasPart and schema.org hasPart don't seem to quite match up
  3. Makes me wonder how this isn't just opening a door for new blackhat tricks
Stu Furlong
  • 3,490
  • 5
  • 34
  • 47
  • Great answer. I have to mention that I am worried about googlebots not currently indexing the paid content, cause it can be removed. So I’m planning to add some backend filters that will treat googleBots as a paid user, in order to ensure no paid content is removed, which lets google index the paid content. The question is: could that be considered cloaking? Or is cloaking limited to the concept of visually hidden elements? – Drkawashima Sep 21 '18 at 17:08
  • Tough to say exactly. I would think it is cloaking if you take Googles definition at face value: "where the content served to Googlebot is different from the content served to users." Since you're doing it randomly, this is an odd case though.. and unfortunately the workings of the algorithm/Google bot are not fully known. I am not sure how helpful the JSON-LD is at this point either way, but it might not hurt to have it on your '.paid-content' – Stu Furlong Sep 22 '18 at 19:28
  • I'd add that it might be helpful to ask the folks at the SE Webmasters: https://webmasters.stackexchange.com/ Also here's a link defining cloaking (for Google news): https://support.google.com/news/publisher-center/answer/40543?hl=en – Stu Furlong Sep 22 '18 at 19:29
  • Yeah, it's surprisingly hard to find guidelines for how to handle this, especially when I'm removing paid contents backend and not just visually hiding it on frontend. I've followed your advice and asked the webmastas over at https://webmasters.stackexchange.com/questions/117936/isaccessibleforfree-and-paywalled-content-delivered-to-googlebots – Drkawashima Sep 24 '18 at 19:33