SEO best practice for subscription-based and paywall content

The web community often asks for SEO best practices for their web sites. Today, we will share a two simple SEO steps helping Search Engines to index subscription-based and paywall content to get more visitors from search engines without compromising the publisher’s economic model.

Step #1: Enabling crawling of subscription-based or paywall content

The first step is to allow search engines, like Bing, to see the full content that normally resides behind a paywall or a subscription. By accessing the full content, search engines will be able to index more text which will help match more customer queries.

It’s important to understand that search engines offer publishers ways to discern whether the crawler claiming to be the search engine crawler is in fact the real crawler. Bing crawler bingbot, for example, only operates from within a limited set of IP ranges. As a result, you can identify Bingbot by referencing the IP address against the public list of Bingbot IP addresses. We recommend you regularly review this list daily as the range can change at times.

Step #2: Avoid leaking subscription-based or paywall content in search results cache pages

To prevent search engines from exposing publisher subscription-based or paywall content in search engines cache pages, publishers can control whether search engines show a cached page of a document by using a special robots meta tag in the <head> section of the web page or, alternatively, by using a customer HTTP response header returned to the search engines crawler for each URL.

Method 1: Using the Robots Meta Tag

Use the following robots meta tag in the <head> section of each page of subscription-based content that should not be cached:

<meta name="robots" content="noarchive">

or the equivalent

<meta name="robots" content="nocache">

Method 2: Using X-Robots-Tag

The custom response HTTP response header that achieves the same as the robots meta tag looks as follows: 

X-Robots-Tag: noarchive

or its equivalent

X-Robots-Tag: nocache

Setting the HTTP response header is the only solution to avoid caching for non-standard web pages, such as PDF, Microsoft Office documents.

Following these SEO best practices puts publishers in control, enabling them to easily choose when to provide Search Engines access to subscription-based and paywall content.

Fabrice Canel
Principal Program Manager
Microsoft Bing