Is your site ranking rank? Do a site review – Part 4 (SEM 101)

This is the fourth of five posts on the topic of conducting your own site reviews. In the previous posts, we discussed why you'd want to perform a site review (Part 1), then took an initial look at page-level issues (Part 2), followed by a discussion of site-wide issues (Part 3), that can affect site performance for users and search engine ranking. In this post, we continue our look at site-wide issues that should also be examined in a site review.

Using HTTP redirects

When you remove, rename, or relocate a previously published webpage on your web server, do you implement an HTTP redirect on your server to help users find the content they are looking for? If not, you should. Otherwise, any inbound links from external sites directed toward those old pages will no longer work. Users navigating to your site through those links will likely move on, and any search engine ranking based on that old page will be lost as well.

Many webmasters mistakenly use 302 redirects by default, which are strictly intended for temporary moves (such as to an interim page for sold out inventory item page), when they should be using permanent, 301 redirects. The significance of using 302s versus 301s is important to search engines, as that information is applied to the pages in its index. When a 302 is used, the ranking status of the original page is retained by that URL in the index, whereas a 301 transfers the ranking status of the old page to the new URL. When a page is permanently removed, renamed, or relocated, the new version fails to inherit the rank status possessed by the previous page when a 302 is used. To optimize your redirects, always use 301s for permanent changes to your site.

Site review task: Review the way HTTP redirects are used on your site. If any permanent redirects are being used, be sure they are configured as 301s.

Empowering canonicalization

While we briefly touched on this topic in Part 3 in our discussion on linking, let's finally dig into a review of how you are linking to your internal pages and if you're being consistent in your methodology. The issue is URL canonicalization, aka determining the standard URL for a given page (especially the site's home page). The reason this is important is because each variation of a URL used to refer to your site is individually tracked and ranked by search engines. When you allow multiple URLs to be used as valid inbound links from external sites to the same page (and/or are inconsistent in how you form URLs in your intra-site linking), the URL variations become duplicate content that the search engine has to recurrently crawl and then potentially index. To avoid diluting the ranking value for a page among many URL forms, consolidate all of the alternative forms into the primary URL for the page. This process is called canonicalization.

For example, many sites allow you to reach their home page by either including or omitting the subdomain prefix "www.", including or omitting the home page's filename, and more. The following URLs, all of which are considered separate locations to search engines, usually lead to the same page:

  • mysite.com
  • www.mysite.com
  • www.mysite.com/
  • www.mysite.com:8080
  • www.mysite.com/default.htm
  • www.mysite.com/en/us/
  • www.<ExternalHostProvider>.com/~mysite

The possible permutations of these variations can be quite numerous, with each one earning its own rank value in the search engine index. But you can resolve this rank redistribution riddle. While you can't control what URL form other webmasters use in their outbound links to your site, you can use 301 (permanent) redirects to aggregate all possible alternative URLs to reach (and thus funnel that inbound "credit" to) the one primary URL for your site's home page.

Run this test: Start your browser and open up three session tabs for each of your favorite search engines (don't forget Bing!). In the first tab's search text box, type the following:

SITE:<YourDomain>.com

specifying just your root domain name and its associated top-level domain (TLD), but omitting the "www." subdomain prefix or any other subdomain name. This query searches for all pages in the index from the entire domain (including all subdomains). Next, in the second tab for the same search engine, type:

SITE:www.<YourDomain>.com

which includes the "www." subdomain. This query specifically looks for indexed pages associated with the "www." subdomain. Then in the third tab for the same search engine, type this:

SITE:<YourDomain>.com -SITE:www.<YourDomain>.com

The last search reruns the first query while using the second query as an exclusion filter. This filter removes all indexed results that include the "www." subdomain in the URL. If you do get results in the last test (which can include URLs in subdomains other than "www."), compare the search results in detail to see if both the "www." and non-"www." variations of your site's URLs are in the index (as well as other URL variations as listed above). If so, you're allowing the hard-earned rank for your pages to be diluted between multiple URLs. You need to canonicalize your site to consolidate the URL variations in the search engine so your pages can earn the highest possible rank for the canonical URL form.

While you're at it, take a look at what URLs you are using in the cross-links between the pages of your own site. Do they consistently use the same singular URL form to each page? I recommend using absolute rather than relative links, which are optimal for continuing to build up that aggregated canonical credit for your site's pages. For more information on canonicalization, see the blog posts Making links work for you (SEM 101) and Optimizing your very large site for search - Part 1.

Site review task: Check to see if your site allows multiple URL forms to access the same page. If so, determine which URL form you want to be primary and then implement 301 redirects for all possible variations to that primary URL. Also, check your intra-site links to be sure the URLs to other pages on your site are consistently formatted the same way.

Don't we all want validation (of our coding)?

Is your HTML source code valid? Just because it displays more or less correctly, are you sure it's solid? (Some browsers are much more tolerant of HTML coding errors than others are, so you may not actually see the problems. However, search bots are typically not as forgiving as those tolerant browsers, which is why this issue is important.) Errors in your page source code can have a detrimental effect on your page rank if the search engine doesn't understand and thus can't effectively crawl your code. For example, if you didn't properly format the <head> tag code discussed in Part 2, all of the work you put into enhancing its content for keyword usage could be for naught. Test your HTML source code with the validation tool in your webpage development environment or at a public resource such as the W3C Markup Validation Service page. You might be surprised at the number of problems found, but the details provided usually make them easy to resolve.

While you're at it, does your site comply with the federal regulations Section 508 for disabled users? While non-compliance with this may not directly affect your page ranking, improved compliance may enhance the number of visitors to and participants in the activities on your site. Hey, it can't hurt!

Site review task: Run a source code validator on all of your web page files and make corrections. Check your site for Section 508 compliance.

How about content validation?

Lastly, have you ever run a spell check on your content? Misspelling a keyword will never help you improve your ranking in the use of that term. Is your content logically organized? Do you have content split up into logical segments? If not, maybe you need to reorganize the content of your pages. Pages that were once initially small but have grown over time into long pages are ripe candidates for this consideration in your site review. For more information on content architecture, check out our blog post Architecting content for SEO (SEM 101).

And horrors that be: do you have broken links? While it can be a drag to have to constantly recheck the validity of your site's outbound links when you have a ton of links, it does matter. And there's no excuse for broken links between pages of your site! Submit your pages to the W3C Link Checker for validation. If you've already registered to use the Bing Webmaster Center tools, log on and use the Crawl Issues tool (with the File Not Found (404) issue type) to get a report on all of the broken links on your site (up to 1,000 line items in a downloadable report).

Site review task: Check the content on each page for spelling errors, overly long pages that should be split up, and the validity of the links.

Custom 404 error pages

What do visitors see when they type in an incorrect URL or follow a broken, inbound link to your site? If the URL points to a non-existent page on your site (when no redirect is in place), your web server will return an HTTP error code 404, which presents a generic File Not Found screen in your browser. It's almost guaranteed that a user will abandon any further attempt to use your site, which is a lost opportunity for you.

Instead of losing that potential conversion, implement a custom 404 error page for your site. By creating a better user experience for customers, as long as they get to your domain's web server, you can provide them with a custom "File not found" error page that offers them basic information about your site's content, wrapped in your site's page design theme template, and provide them with your primary site navigation scheme continuing their search. For more information on developing a custom 404 error page, see the blog article Fixing 404 File Not Found frustrations (SEM 101). Note that you can implement both HTTP redirects for known page moves, and a custom 404 page for all other broken links when no redirect is in place.

Site review task: If you have not set up a custom 404 error page for your site, do so.

Page-level web spam techniques

Are you using page-level web spam techniques in an effort to deceive the bot into giving you a higher-than-deserved ranking? The search bots, as they crawl the Web on a daily basis, see every conceivable web spam technique. The indexing algorithms are constantly updated to reflect this exposure to attempted deception, and when web spam is encountered, penalties usually ensue for the offending site. SEO is hard work. Providing legitimate ranking is important to search engines because it's important to our customers. Any detected efforts to cheat the system is definitely frowned upon, and for webmasters who use these techniques, it ultimately results in the exact opposite of their original goal - poor ranking (if not outright expulsion from the index). For more information on page-level web spam, see the blog post The pernicious perfidy of page-level web spam (SEM 101).

Site review task: Review your site for any use of web spam techniques and, if found, remove them from your site. If your site has already been penalized for such usage, after you clean up the offending issues and republish your site, you can then request reconsideration of the penalty by following the instructions listed at the end of the blog article The liability of loathsome, link-level web spam (SEM 101).

Got malware?

One of the best ways to sink your site's value to users is to allow your site to be infected with or links out to malware. Now most webmasters are certain that their sites are clean (and likely, most of those are). But malware infections aren't always intentional on the part of the webmaster. Web servers can be hacked and malware stealthily deployed (typically the "drive-by" form of malware), despite the best intentions of a site's webmaster. And malware infections not only affect the host site, but also the sites that have outbound links to it. Of course, it's the visitors to the infected site who pay the price (unless you consider the repercussions for the infected site, for when visitors realize they became infected with malware at that site, they will likely never return there to do business, and will likely cast their warnings of such site far and wide).

When a site is infected with malware, search engines that scan for malware will detect that and warn their users about the malware threat in the search engine results page (SERP) listing. This warning includes all pages that have outbound links to other infected sites - so while your site may technically be clean, an infected link on your page may still generate a malware warning in the SERP). Given that webmasters typically have no control over the security of the sites they link to, this accessory-to-the-crime affiliation with malware-infected sites is unfortunate, but it's appropriately done to protect search engine users from infection.

To see if your site has been identified as infected with malware by the Bing crawler, log on to your account in Webmaster Center tools, click the Crawl Issues tool, and then select the Malware Infected issue type. If your site does contain malware, you'll see a list of which pages are infected. To see if you are linking out to other sites that are identified as malware infected, in Webmaster Center, click Outbound Links, and then Show only outbound links to malware. From either tool, if you get positive results, you can download a CSV file with the name of the affected pages and other information to help you clean up the mess.

We published as detailed series of blog posts on malware and website security issues. For more information on identifying and the implications of a malware infection, see The merciless malignancy of malware Part 1 (SEM 101). For likely website malware attack vectors and information on how to clean up infections, see The merciless malignancy of malware Part 2 (SEM 101). For my Top 10 list of recommended security strategies for avoiding malware infections (split into two parts!), see The merciless malignancy of malware Part 3 (SEM 101) and The merciless malignancy of malware Part 4 (SEM 101).

Site review task: Check your site for possible malware infections and clean up any detected problems. Implement recommended security measures to reduce likelihood of future malware attacks.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Coming up next: In the last post of this series, we'll look at some architectural issues that can help your site with the search bot. Until then...

-- Rick DeJarnette, Bing Webmaster Center