It’s easy to get caught up in the latest and greatest, the forward thinking and the new and shiny. But it always pays to keep an eye on the basics. The stuff you feel you really already know. The stuff that often comes to haunt us when we least expect it.
Everyone wants to be indexed. “Let the crawlers come!” But there are times when you need to control access. Areas you don’t want crawlers to enter and index. If you’re doing server maintenance, you may want to calm things down a bit to ensure the remaining servers aren’t overloaded. There are directions you want to give a crawler – sitemap is located here, don’t enter this area, but do go everywhere else.
That’s why robots.txt was created – to put control into the hands of the website. It’s a powerful tool, though, as a simple keystroke can tell us to stop crawling the site. And the robotstxt.org site is maintained to explain details around how to implement this powerful file.
It pays to review both the site and your actual robots.txt file frequently. Often it may go missing, or get edited without your knowledge by others with access to your servers. This isn’t malicious stuff, but simple things like an IT person not knowing what the file is and removing it. Or someone pushing out updates to a live-site server and over-writing what was there with fresh files…lacking your robots.txt file.
And unless you are absolutely, 100% sure if A controls B, reference the website.
A confusion point for many remains this: the robots.txt file contains directive about what NOT to do. Adding lines around what you want us to do is simply taking up space. No harm, no foul, but we assume it’s “crawl everything” unless you say otherwise.
Now, another tool in your box on this one is found inside the Webmaster Tools themselves. The Crawl Control feature lets you fine-tune how Bingbot visits your site. The interface is simple to use and once you click SAVE, your new default setting is in place. Changes can be as frequent as you desire.
Sitemaps, a webmasters best/biggest fremeny. They list all of your site’s URLs. Though in many cases, they do not. And in some cases, they hold many more URLs than even you thought your site contained. Sitemaps are referenced in your robots.txt file so the crawlers know where to find them. But while they can be very helpful in getting your content found, crawled and indexed, there can be a downside here, too. Keep a dirty sitemap and we’ll begin to mistrust it, eventually stop visiting it altogether. If you 404 a page, make certain to clean it from the sitemap as well. We’ve got better places to allocate resources than discovering dead URLs.
Ideally, your CMS will create sitemaps as you publish new content, updating the sitemap file in real time. Add new stuff, clean out the old and expired URLs. But this is tough to accomplish. Most off-the-shelf CMSs don’t do this.
So in addition to a regular, frequent review of the sitemap files themselves, consider submitting an RSS feed as a sitemap through your webmaster tools account. At the very least it means we’ll see everything you publish when you publish it.
And if you want the detailed word on sitemaps, click through to the sitemaps.org site built just to offer such guidance.
Content – Video, Images, Text
It’s no secret that searchers seek content. But what pleases them most? Video? Images? Text? Well, yes, in fact. Each has a role to play, and it suited to different tasks. Now you might be tempted to say “Video Rules!” and it’s tempting to agree. But consider a person shopping for a new car. At some point, they’ll want to review the stand features and options available on the model they seek. Scrolling what could be hundreds of items in a video isn’t the best choice. A list in text is a better option here. Save the video for the walk-around of the car, which people will enjoy and engage with.
With the explosion of image-rich sites today, it’s easy to think that the big glossy image on the page is all you need. But think again. What does a search engine index at that point? How does your page with a big image of a watch compare with the page on another site with the same image, the technical specifications, a product description, user testimonials and so on? Yes, your image is bigger. Yes it’s beautiful. But is that page more useful than one which can answer more questions with less work for the searcher? Ultimately that’s up to the individual to decide, but it never hurts to have actual content on a page to help put the image in context.
Time tends to fade our memories so frequent checks of important tools like your robots.txt file and your sitemap.xml plan and execution can streamline your work and help stop problems before they occur.
Sr. Product Manager