Partnering to help solve duplicate content issues

One of the most common challenges search engines run into when indexing a website is identifying and consolidating duplicate pages. Duplicates can occur when any given webpage has multiple URLs that point to it. For example:

URL Description
http://mysite.com A webmaster may consider this their authoritative or canonical URL for their homepage.
http://www.mysite.com However, you can add ‘www’ to most websites and still get the same home page.
http://mysite.com/default.aspx You can also often add the specific filename of the homepage and get the same page
http://mysite.com/default.aspx?promo=ABC Many times websites use parameters to track things like where customers are coming from (in this case an offline promotion), or parameters that determine how the content on the page is formatted.

These four cases are just a few of the many possibilities. When you consider all the combinations of these, you could have more than 10 clone URLs for every page on your site. That means if there are 1 million pages on your site, we could possibly find 10 million or more cloned URLs pointing to them. Determining your canonical URL amongst all the duplicate clutter has been an onerous challenge for search engines as we all work to reduce cost and improve relevance.

To help solve this issue, Live Search has partnered with Google and Yahoo to support a new tag attribute that will help webmasters identify the single authoritative (or canonical) URL for a given page. The link tag defines a relationship between a document and an external resource. In this case, that resource is the canonical URL. The following is an example of the new link tag attribute for canonicalization:

<link rel="canonical" href="http://mysite.com"/>

A few notes about the implementation of the new attribute:

  • This tag will be interpreted as a hint by Live Search, not as a command. We’ll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
  • You can use relative or absolute URLs in the “href” attribute of the link tag.
  • The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx”, and the ”href” attribute in the link tag points to “http://mysite2.com”, the tag will be invalid and ignored.
    • However, the “href” attribute can point to a different subdomain. For example, if the page is found on “http://mysite.com/default.aspx” and the “href” attribute in the link tag points to “http://www.mysite.com”, the tag will be considered valid.
  • Live Search expects to implement support for this feature sometime in the near future.

While we expect this command will help us solve many of the more complex duplicate content issues, we still highly recommend that webmasters follow the existing best practices for normalizing their URLs through domain canonicalization and normalization of URL parameters. We’ll provide more details on the link tag after we’ve implemented full support in one of our upcoming releases. In the meantime, we look forward to hearing your feedback on the new tag.

— Nathan Buggia, Live Search Webmaster Team