Indexing issues? Let’s examine the most common problems.

Wondering why your site isn’t as “indexed” as you’d like?  Check your crawler control signals before getting too upset.  We see a lot of websites making mistakes which can block indexing.  While we’d like to ignore those mistakes and simply ingest the content anyway, that’s not always in your best interest, so far safer that we follow your suggestions.

Be sure to read How To Verify Bingbot is Bingbot and To Crawl or Not to Crawl, That is Bingbot’s Question when you’re done with this article as those articles can help fill in some details as well.

On a recent trip, I met with 22 startups and half of them complained of indexing issues.  That same half had a real problem in their robots.txt files.  The rest were mostly safe, not having the file in the first place.

The problem the first half had?  Well, I’ll let you see if you can spot the problem.  This was their robots.txt file:

User-agent: *
Disallow: /

(The “User-agent: *” means the command applies to all robots and the “Disallow: /” tells the robot that it should not visit any pages on the site.)

Obviously, this would pose a problem for them as their pages and content simply wouldn’t be indexed by Bing.  Bingbot honors the robots.txt directives.  Other search engines may not honor this command (as was evidenced by the content indexed from the sites).  While not getting indexed is one form of problem, having content indexed you didn’t want indexed can be even thornier.

While blocking via the robots.txt can be obvious, other things can be implemented which can be equally damaging to our ability to index your content.

Within Bing Webmaster Tools is the Crawl Control feature, which enables you to control the pace of crawling and what time of day Bingbot crawls your site.  Pretty handy tool, but you have to understand how to use it.  Bingbot checks this feature for commands on interacting with your website, so setting the crawl rate to its lowest setting tells Bingbot to crawl you very lightly and slowly.  For large websites, this will make it hard to crawl all your content.

Changing the settings to “full volume” will increase the crawl rate, but be ready for the load this places on your servers or indexing might not be the only issue you face.  For the vast majority of sites online today, this won’t be an issue.  Maxing the controls and control rate won’t shut you down, but be sure you know what the load on your servers is doing to those visitors trying to access your site at the same time.  The controls allow you to set the crawl rate at various times across your day, so you can easily manage things.  Maximize your crawl rate when your visitors are not on the site, and slow down crawling when they are visiting your site.

You need to also ensure you’re not selectively blocking Bingbot.  If you are, your expectations for indexing should be low, as no crawling means no indexing.

Query Per Second limits

Another common issue we encounter when trying to crawl sites are limits imposed on the number of queries per second (QPS) we can make of the server.  These limits are often put in place to protect the systems, which is completely reasonable.  Unfortunately, many System Administrators set very low limits unaware of the issues this can cause for your SEO work.  This is especially problematic for large websites and sites that produce fresh content in large volumes.  Similar to the point made above, if we can’t call the content from the server, we can’t index it.

Hints and help

Inside Bing WMT, you’ll find the Fetch as Bingbot tool.  Drop a URL in there and see what comes back.  Because that feature adheres to the robots.txt protocol, if you’re blocking us in the robots file, you’ll see a message from the tool noting this issue to crawling.

For that matter, dig deeper and make sure there is nothing else at the server level that’s blocking crawlers.  Talk to your IT folks.  If you’re working from a hosted solution, check with your host to make sure they aren’t blocking crawlers.  We see this happen frequently as hosting companies seek to protect their servers and limit bandwidth consumption.  In most cases, the actual websites are completely unaware this blocking is even happening.

Check your crawl error reports.  Many times we’ll ping websites and get 500 errors returned.  When this happens, we back off and try again later; no sense continually hounding the server, it’s obviously having problems.  Watch your 404 pages as well.  While you can’t really have “too many” 404 returns, it could indicate a bigger issue.  If someone moved content to a new folder and didn’t implement 301 redirects, it’ll take us time to recrawl the site and index the new pages.  And you’ve lost the value the 301s could have transferred to the new pages.  Possibly more problematic is that the 404 responses start the “release from index” process for pages which don’t return the content as expected when called.  If you intend to bring the content back to the original location, you could be setting yourself up for losing traffic while the URLs that 404’d are removed, then must be recrawled and indexed again.

So be sure to investigate all the options if you think you’re suffering indexing issues.  And watch those robots.txt files.  They’re powerful and we listen to them.

Join the conversation

13 comments
  1. Valbou

    My website is already indexed by Google at 90%, but Bing not indexing photography pages with some text (but subject is : Photography not the text)… I don't understand how can i show photographies in Bing…

     

  2. Pennimus

    Re: QPS limits.

    Your comments here imply that either there is a finite amount of time you will spend crawling a given site (thus QPS limits would hinder full crawling of the site due to running out of time) or that you have a QPS limit threshold beyond which you won't bother crawling anything.  Thoughts?  

    And what would you suggest webmasters set their QPS limits to?

  3. maria_a_i

    My website is indexed by google.com but bing has been pending for 3 monts. No robots blocking indexing…

    Thank you

  4. nepyl

    my website has a large number of pages but neither bing nor google is indexing all of them. I just learned that there might be a query per second limit imposed on my blog. Does google has such thing? because it is hosted in google servers..

  5. Affiliate-Resources.INFO

    I too have this same problem. It seems like bing is expecting to get paid if you want your site indexed by bing. Funny how Google seems to have indexed all my pages, but bing? none!

    Something to think about?

     

  6. classicart

    Google has indexed 1,016 pages. That's ALL of them. Bing has indexed 579. I have a site map. Bing has had this problem for a long time. They don't seem to be interested in fixing it. I've complained about it before on here. I gave up.About a year and a half later and still not fixed.

  7. rlk

    Tell us something that isn't obvious.

  8. shayspain

    22 startups complaining about indexing, 11 with no robots.txt and 11 with a disallow /

    jeez… spend less time "inventing" stories and more time doing what MS has always done best, COPY (Google) and then  EXTEND it (ie: improve it) !!!!

  9. bonplanvoyage

    When I do url:bonplanvoyage.com or  url:www.bonplanvoyage.com, answer is "No results found for url:bonplanvoyage.com" but when I check my webmaster tools, I have 50 Pages Indexed. How it is possible?

    Thanks

  10. alex2013

    my site car-and-safety.com has no pages indexed :(  had no such problem with other sites…noticed that we didn't have www to non-www redirect – fixed now, hope this will get the indexing started. is there any other way to check for potential problems (via bing webmaster account)?

  11. alphaservices0

    I have just noticed that all my pages are now no longer indexed in bing? Seriously? I've worked hard to get index and to have it all just thrown away? I know you guys need to make money and stuff but seriously? Feels like search engine hijacking to me.

  12. HaveaNiceDave

    Does anyone know if Bing will not index correctly if they are using Cloudflare? Our site has quite a few sitemaps in the Bing webmaster tools, however it seems that they are all showing as 403s?

    Can someone please tell me the correct bingbot ip addresses? Since we want to make sure they weren't blocked.

  13. iztok_mra

    In 6 months Bing has indexed 2.500 pages of my 300.000+ site. Google managed to index it all in the same amount of time. There's obviously a problem, but I have no idea what to do about it without knowing more?

Comments are closed.