4

Question

The "paywall notice" does not seem to be recognized in Google's documentation. I am trying to make it visible to all, yet excluded from the page topic and content, without causing cloaking issues. Can I do this in the DOM (for example with the role attribute), or do I need to do it in the JSON-LD markup?

Background

I am implementing a website paywall using client-side JS, with a combination of open graph markup and CSS selectors.

The implementation is based on the programming suggestions by Google at https://developers.google.com/search/docs/advanced/structured-data/paywalled-content

There are 3 types of content on this site, and in this implementation all 3 are rendered by the server for every visitor regardless of paywall status:

  1. Free content, visible to all;
  2. Paywall notice, not part of the page content/topic, visible only when not logged in; and
  3. Paywalled content, visible only to logged in users and search crawlers.

Type 2 is what is causing trouble, and this is not documented by Google.

HTML

<html>
  <head>
  </head>
  <body>
    <div id="div-1" class="non-paywall">
      All visitors can see this sentence, whether or not subscribed.
    </div>
    <div id="div-2" class="paywall-notice" role="dialog">
      <!-- This element is the issue in question -->
      If you are setting this notice, you are logged out our not subscribed. You cannot see the main content of this page. Please subscribe!
    </div>
    <div id="div-3" class="paywall">
      This section is paid content. 
      If you can see it, you are a logged in subscriber or a verified crawler (e.g. googlebot or bingbot).
    </div>
</body>
</html>

JSON-LD

{
    "@context": "https://schema.org",
    "@type": "WebPage",
    "@id": "https:\/\/foo\/page\/#webpage",
    "mainEntityOfPage": {
        "@type": "Article",
        "mainEntityOfPage": "https:\/\/bar\/article"
    },
    "isAccessibleForFree": "False",
    "hasPart": [
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "True",
            "cssSelector": ".non-paywall"
        },
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "True",
            "cssSelector": ".paywall-notice"
        },
        {
            "@type": "WebPageElement",
            "isAccessibleForFree": "False",
            "cssSelector": ".paywall"
        }
    ]
}

If paywall notices (#2) are treated the same as #1, there seems to be a risk the crawlers will assume they are part of the page content and include in assessment of relevance to search intent.

I cannot find any official recognition of the existence of #2 or guidance on how to treat it, whilst honouring the objective of paywall markup and avoiding cloaking issues.

There are a combination of approaches at Handling isAccessibleForFree for client side paywalls, and a related issue at https://webmasters.stackexchange.com/questions/117936/isaccessibleforfree-and-paywalled-content-delivered-to-googlebots, neither of these address my original question above.

Optimally, I would like to implement this the way Google wants me to... if only I know what that was!

More background

In order to be able to serve paywalled content to googlebot, the server renders the same HTML to all visitors. After page load, some JS would check if visitor is googlebot, and if so:

  1. Remove the .paywall-notice element/s
  2. Show the .paywall element/s

There may also be periodic or interaction-driven checks to remove .paywall element/s for non-googlebot visitors, but that should not affect this question if the markup correctly shows googlebot that those element/s are paywalled.

ed2
  • 1,457
  • 1
  • 9
  • 26

1 Answers1

2

Is it possible for you to detect the crawlers server side and not render the paywall-notice element at all? The point of this markup is so that you don't show different content to Googlebot vs an average anonymous visitor. I think as long as you wrap the "paid" content of the article in the paywall class you don't have to worry about getting penalized for cloaking.

On wsj.com we have a server side paywall so when Googlebot comes to the site we don't even render any of those marketing offers like what you have in your paywall-notice element. We just render the full article and wrap the paid content in the paywall class. So if it's possible for you, send Googlebot the page without that paywall notice element.

By the way, nyt.com has a front end paywall and they aren't doing anything special about marking up the marketing offers. They just mark up the paywalled content same as your example. Just make sure to remove paywall-notice from the hasPart array as it definitely shouldn't be in there.

Marcin
  • 278
  • 1
  • 7
  • Thanks, it's likely to be a frontend solution by manipulating the DOM with JS. Anyways if I serve paywall-notice to regular visitors but not googlebot, wouldn't that be a cloaking penalty risk? That was what led to the OP. – ed2 Sep 24 '21 at 10:12
  • 1
    @ed2 Not showing the paywall notice to Googlebot would not be considered cloaking. Cloaking is showing googlebot something that an average visitor to the site would not see. Take a look at the "Cloaking and Google" section here https://www.searchenginejournal.com/google-answers-is-this-cloaking/402823 – Marcin Sep 24 '21 at 16:05
  • Thanks. So only the `.paywall` and `.non-paywall` elements will be represented in the JSON-LD markup, the `.paywall-notice` will be omitted from the markup and not described? Also, with the current CMS and server setup, the only practical way to omit `.paywall-notice` from googlebot, is to render it as normal and then remove it from the DOM with JS *after* the page loads and detects googlebot. Not ideal, but it's where we are at. – ed2 Sep 24 '21 at 23:49
  • Inspecting the nyt example, it looks like there is only one relevant `haspart` item in the JSON-LD, being the equivalent of `.paywall`, so the equivalent of both `non-paywall` and `.paywall-notice` are both not mentioned in the schema. The equivalent of `.paywall-notice` exists in the DOM as a specific element id, but the equivalent of `.non-paywall` is not even specified in the DOM; it is simply "everything else". Have I interpreted the nyt DOM and JSON-LD correctly? if so, does this mean my JSON-LD should have only `.paywall` and remove *both* the other two items from the array? – ed2 Sep 25 '21 at 00:34
  • 1
    Correct, you only need to put the classes for elements that are paywalled to make sure you are not cloaking. We do the same thing on wsj.com that NYT does. We just have that single class in the JSON-LD: `"hasPart":{"@type":"WebPageElement","isAccessibleForFree":false,"cssSelector":".paywall"}` – Marcin Sep 25 '21 at 00:57