Get detailed site analysis to solve problems (SEM 101)

Last week I discussed common site errors that can inhibit the search engine crawler (aka bot). I pointed out common areas of concern and recommended a few tools to help diagnose such issues on your site’s pages. This week, I wanted to introduce a new tool for your collection that will be of great service to you in checking and maintaining your site.

The tool, Search Engine Optimization (SEO) Toolkit from the Microsoft Internet Information Server (IIS) team, was released as beta 1 back in June, 2009. I haven’t had a chance to talk about it much in this blog until now. Let’s fix that.

Requirements

The IIS SEO Toolkit runs as an extension of IIS 7. This means you have to have IIS 7 on your local machine (sorry Windows XP users!). Users of Windows 7 are good to go now, and users of Windows Vista can easily install IIS 7 by way of the Web Platform Installer tool, which I’d recommend doing if only to try out this very cool new tool.

Feature set

Once installed, the IIS SEO Toolkit works within your local installation of IIS 7. However, the tool itself is not limited to only doing its SEO analysis on IIS-based websites. No way. It can scan sites running PHP, ASP.NET, and all manner of HTML/XHTML pages.

The IIS SEO Toolkit consists of three separate add-on modules to IIS 7:

Site Analysis. This tool begins with a local, full-featured website crawler engine (aka bot) named IISBot that offers the following features:

  • Configurable number of concurrent requests to allow users to crawl their website without incurring additional processing. This can be configured from 1 to 16 concurrent requests.
  • Support for standard Robots.txt commands, allowing you to specifically dictate which web files and directories IISBot should not crawl.
  • Support for Sitemap files, enabling you to identify important areas of your website to be analyzed.
  • Support for overriding “noindex” and “nofollow” <meta> tag attributes to allow you to analyze pages that are otherwise off-limits to typical search engine bots.
  • Configurable limits for analysis, such as maximum number of URLs to download and maximum number of kilobytes to download per URL.
  • Configurable options for including content from only your directories or the entire site and subdomains.

IISBot examines your website contents, discovering links, downloading the content, and applying a set of validation rules aimed to help you easily identify and troubleshoot common problems. These rules cover a variety of pre-configured site violations, such as broken links, duplicate content, keyword analysis, route analysis, and many more. Other features of the Site Analysis tool include:

  • View detailed summary of Web site analysis results through a rich dashboard.
  • Feature-rich Query Builder interface exposes large amounts of data.
  • Quick access to common tasks.
  • Display detailed information for each URL.
  • View detailed route analysis showing unique routes to better understand the way search engines reach your content.

Using the Site Analysis tool to identify and thus correct violations will help improve the overall quality of your website and thus improve your SEO performance. It also provides a great deal of information about your site’s structure and architecture design, all of which help you maintain the site and eliminate redundancies.

Robots Exclusion Editor. This tool includes a powerful editor to create a new or edit an existing Robots Exclusion Protocol (robots.txt) file. It can leverage the output of a Site Analysis crawl report and allow you to easily add Allow and Disallow file and/or folder entries without having to edit a plain text file, making it less error prone and more reliable. Furthermore, you can run the Site Analysis feature again and see immediately the results of applying your robots.txt file.

Sitemap and Sitemap Index Editor. Similar to the Robots.txt editor, this tool allows you to author Sitemap and Sitemap Index files with the ability to discover both physical and logical (Site Analysis crawler report) view of your site.

Using the tool

Once you have finished the tool installation process, start IIS 7. You’ll find the icons for the IIS SEO Toolkit in their own group called Search Engine Optimization.

We’re going to focus on the Site Analysis tool for this article. When you open that tool, you’ll land on the Site Analysis page, where you can open an existing report or click New Analysis to create a new site report (including updates to existing reports).

When creating a new report, you’ll provide IIS SEO Toolkit with a report name and the site’s root URL to scan (I used a sample site saved on my localhost for these images).

IIS SEO Toolkit crawls the site’s home page and follows the links on the page to all of the other pages on the site and examines the content they contain. Once the site is fully crawled, the tool analyzes the data brought down and creates a summary report on the state of the site.

The site information is categorized in the following groups: Violations, Content, Performance, and Links.

The first report is a list of detected site violations. These violations, prioritized into levels named Errors, Warnings, and Information, are also aggregated into error types that attribute them as violations of either SEO, Standards, Content, or Performance.

Double-clicking on a listing in the Violations Summary tab opens up a new tab containing the Group Details of that violation, where each page containing that violation is identified.

Once you get into the Group Details of a specific violation, you’ll see how the pre-configured rule queries are built. You can then use the query tools to begin your own, detailed exploration of your site’s issues by building and saving your own custom queries.

Double-clicking on a listing there calls up information on that particular violation, including details of the violation and recommended actions to take.

You can change tabs in that dialog box to see the affected page’s header, code content, a listing of inbound and outbound links associated with that page, and analysis of the word usage on that page (which is helpful for correcting missing keyword text, such as image alt attributes, meta descriptions, and page titles).

The Site Analysis Report also provides a list of the pages containing the most violations. Clicking through those listings will also provide you with access to the same kind of deep dive data as mentioned earlier about the affected page.

The IIS SEO Toolkit does more than merely offer violation reports. It takes the data it gathers about your site’s pages and enables you to get aggregated reports on other issues and metrics. For example, the Content tab offers a plethora of site analysis data. You can get a list of all external links, duplicated content, and other very useful information.

The reports also offer information on performance issues on a per page or per content-type basis.

Lastly, you get detailed information about the links on your site.

As you can see, the Site Analysis tool is feature-packed and will be very useful to webmasters and SEOs everywhere. There is not enough time or room in this blog to document everything this tool can do, but I can link to pages that will be of use to you in learning more about it.

Be sure to check out a nice, 14-minute video tutorial introducing the IIS SEO Toolkit, as well as a detailed blog post from IIS team lead Scott Guthrie as he uses it to examine his own personal website. IIS team developer Carlos Aguilar Mares also has a blog in which he regularly does deep-dive explorations into the tool’s extensive feature set. It’s worth following Carlos’ blog for his useful insights, tips, and tricks on getting the most from this great tool.

Keep an eye on the IIS SEO Toolkit’s home page for updates. Remember that it is still in beta, so there’s more to come as the development process continues.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. See you again soon…

— Rick DeJarnette, Bing Webmaster Center

***********************************************

UPDATE

We’ve gotten some feedback from webmasters asking how to set up IIS 7 on their computers so they can use the IIS SEO Toolkit. To help those folks out, we’ve published a new blog post to cover that question in detail. Take a look at Getting the IIS SEO Toolkit up and running to learn how to enable IIS 7 on your computer. Thank you all for the helpful feedback!

Rick DeJarnette