Better than canonical; URL Normalization

At Bing, we have to crawl and index The Internet and The Internet is pretty big. One common question that Fabrice Canel, a veteran of search here, is commonly asking people is to figure out how big the Internet is?  How many URLs are out-there?

The answers that come back fluctuate from “a few thousand” (ok you clearly don’t use The Internet), a few million (ok good start, plenty of people on Facebook, but please continue thinking), a few billion (more than 1 billion people are using the internet and having some kind of profile page set).  This is all good thinking, but keep going.  A few trillion, maybe? (Well, now the numbers are becoming so huge that they have different meanings per country.)   After a while, some talented people get it right: the real number of URLs on the Internet is “infinity”.

The Internet is so huge that we can crawl forever discovering new links.  Not only regular, good content but also plenty of unexpected content; not necessarily relevant for search engines, such as a “Next Day” link in a calendar hosted in web pages, keyword tag cloud links, etc.  The list is almost endless.

Webmasters can help the search engines and themselves by removing useless and duplicate content from the crawler’s path – make sure we don’t see it. Guiding search engines to your most relevant pages and helping them to discard less relevant pages is an SEO recipe for success. We have a couple blog posts related to this idea that “less is more”, such as our blog post on optimization guidelines for large sites and our blog post Building Websites Optimized for All Platforms.

Today, we want to remind folks that among the solutions you have to fix duplicate problems on your site, relying on canonical tag is not necessarily the perfect solution to fix all your duplicate content problems.  Let me share an example of a site outputting hundreds of millions of URLs including plenty of dupes and I’ll explain what happened when the search engines started implementing canonical tags.  Names will not be named, but this is a real example.

Search engines will follow links to canonical destination URLs discovered in the canonical source URLs, but they will continue to visit canonical source URLs as the canonical destination may change and they check similarity between the content of the source and the content of the destination. By visiting canonical source pages, search engines waste the limited crawl bandwidth they generally have assigned to a website, and worse, may continue discovering inside the canonical source page plenty of URLs with extra useless URLs parameters. This extra-crawling and processing impact the overall quality of the site in our index and can create unneeded crawl loads on websites. Testing of differences between sources and destinations may impact the transfer of signals from source to destination, as well.

So we want to tell you that at Bing, we have a better solution to fix your duplicate problems than simply relying only on the canonical tag.

When you have duplicate problems due to extra URLs parameters, using the URL Normalization feature in the Bing Webmaster Tools is the preferred method as you are telling us which parameters can go away; we call this URL normalization. By normalizing, we mean that our crawler will not visit the URLs with extra parameters except for an occasional test of the quality of the normalization rules. You benefit through less pointless crawling which reduces resource loads on your server, fresher copy in our index of the canonical destination URLs and less out-links being discovered with extra-parameters. You can still use the canonical tag on these pages as complimentary information to the URL Normalization rules provided.  Another advantage of using URL Normalization in the Bing Webmaster Tools is that you don’t need to wait for your engineering team to implement the canonical tag to fix your problem. You can implement these normalization rules right away in the tool without any code change.  You can also do http 301 redirection instead of using the canonical tag when possible.

The solution to some major content duplicate problems you may have can be as easy as a 5 minute task: connect to Bing Webmaster Tools and suggest URL Normalization rules.  To help you we provide in the tool the most common duplicate URL parameters detected on your site. You can also review what we index via the Index Explorer feature to see any other parameters you’d like us to skip when crawling.

Thanks for your help removing duplicate content from your sites and in helping to keeping a lid on infinity.

Join the conversation

16 comments
  1. DisgruntledGoat

    I like the URL normalisation feature, but can you *please* make it clearer what Enabled and Disabled mean? Every time I go there I have no idea if I am enabling the parameter to be crawled (ie saying that it changes the pages content) or enabling it to be ignored!

  2. jonathoncolman

    Yes! What Scott said above. I've been wanting to make more use of this feature since my site uses a ton of URL parameters. But there are two things standing in my way:

    1. I have no idea what "enabled" and "disabled" mean in this context and the help does not address this.

    2. There is a low limit on the number of parameters allowed. Large/enterprise web sites (particularly e-commerce sites) can have thousands of parameters.

    Many thanks for constantly improving and iterating on the solutions in Bing Webmaster Tools! :)

  3. chat

    Disabled mean? Every time I go there I have no idea if I am enabling the parameter to be crawled (ie saying that it changes the pages content) or enabling it to be ignored

  4. andreapernici

    The url normalization is good for website that use ugly URL. What to do for those who use rewritten urls ?

    What to normalize?

    I think the canonical is always the better choice…even if the crawler access the page.

    Probably search engines must be more restrictive on the definition of similar tags.

  5. pappu

    why is my site disappear from bing yahoo search ingine ?

  6. arac takip

    i like bing search ..

  7. bingping

    @andreapernici

    Quoting Duane Forrester's blog post: [quote]…Another advantage of using URL Normalization in the Bing Webmaster Tools is that you don’t need to wait for your engineering team to implement the canonical tag to fix your problem. You can implement these normalization rules right away in the tool without any code change…[/quote]

    I agree, that "enabling/disabling parameters" thing is a bit confusing for some of us. However, look at the screenshot picture above and you'll find the right answer ;-)

  8. ianmacfarlane

    @Bing – Google Webmaster Tools features much more powerful options for parameters, e.g. specifying that parameters are used for pagination etc.

    Also, it would be helpful if Bing Webmaster Tools made it clearer to users whether these parameters are removed before or after *spidering* (I mean spidering rather than indexing) – I've seen this trip people up before with how Google does it.

  9. Mobila Bucuresti

    I am very confused with URL Normalization feature, what is its sense ? and I tried enough time but it is showing same error again and again for different parameters. Please explain this to me.

  10. friesecustoms

    I would just like to know what it is that Bing actually looks for? Even with all of the updates I have a good handle on Google, but Bing doesn't crawl and index in any form or fashion that I can comprehend. Some changes I make that gain rank in Google cause loss in Bing and some gain in Bing. It doesn't seem to have any consistancy.

    Tom Friese

    CEO Friese Customs

    http://www.friesecustoms.com

  11. wbmaterht

    Yes, just similar situation as Tom stated above, any ideas why indexed pages keep drop down? Our site http://www.nbhuntop.com has dropped 35% in the past month, wondering why…

  12. TonySprings

    I am not sure if this is the right place, but the problem I have is this:

    BING calls my website: mydomain.com in the search results.  I need it to say: http://www.mydomain.com/  

    The reason:

    When people visit my site directly from the search page, the cookies don't work, because the cookies are designed to work with the www in front of the domain name.

    How do I get the search engine to list my URL with the www?

    Thank you so much for helping me with this problem.

    Tony

  13. samsung2015

    I think this is totally different now that Bing has updated webmaster tools.  I can't find the link to do this anymore.

    Thanks

    Todd

     

  14. expertwebworld

    i have not seen the "URL normalization" section in my bing webmaster account , i checked under Configure My Site, eports & Data, Diagnostics & Tools and Messages but not found this option

  15. Belsh

    I'm the same can't find it anywhere as it changed as even the pic above doesn't look like my bing webmaster tool section?

  16. AbhijeetQ

    I hope you can help me .

    I am trying to remove blogspot redirects from Bing results and cache . The blog has been deleted but blogspot country redirects like myblog.blogspot.hk, myblog.blogspot.co.uk keeping popping up in Bing results and Bing cache. Site:myblog.blogspot.com shows no result but site:myblogspot.co.uk , site:myblog.blogspot.fr and other country redirects shows links and cache on the Bing results page. I would really appreciate it if  Bing removes these links from Bing results and Bing cache. Bing has been emailed 3 days back regarding this issue. No feedback even though they promised that one of their tech person would reply within 24 hours . The Support Ticket Number is : 1183302220 .

    Thank you.

Comments are closed.