Common errors that can tank a site (SEM 101)

Imagine being a content developer for a website. You write a bunch of clever and informative articles, which should deliver a good dose of new visitors and ranking potential to the site. You submit them to the IT department for publishing online, and wait for good things to happen. But instead, it all falls flat. A look at your web analytics tools reveals that the number of site visitors has not increased over the time your new material was published. Further research reveals that your new content is not even in the search engine indexes! To quote the mighty Fred Willard, “Wha’ happened?” Perhaps some commonly seen site errors prevented your new content from being added to the index.

Just like a house, good web content needs a sturdy, reliable platform on which to reside. What good is a gorgeous, million dollar home if it’s sitting on a foundation of rickety 2x4s? No housing inspector would ever climb into such an unstable house to review it. And when a search engine crawler (aka bot) comes across a website littered with coding errors and serious problems with structure and design, it may, too, abandon its effort to crawl it. If that happens, no matter how good and compelling the content might be, it will never make into the index.

So how do you know if your site is has a rock-solid foundation or is just barely standing up? You need to get into your code. You can use a few good tools to help detect problems, but ultimately you’ll need to understand what the tools are saying when they indicate things are broken so you can fix them. Let’s get into a few of the site errors that are either pretty common or pretty important, and cover what you need to know to avoid their deleterious effects.

Invalid mark-up code

If your page mark-up code is bad, you’re bound to have crawling problems. But you might not know that the problems exist if your testing merely consists of, “How does it look in my PC’s browser?” Modern desktop browsers are pretty adept at munging through what you probably meant to do into a workable, on-screen presentation. They can often deal with code that is footloose and fancy-free when it comes to standards compliance. But the search engine bots are not as flexible as desktop browsers, and code problems can often trip them up and bring the crawling of your site to a halt. In addition to that, mobile device browsers are not likely to be as accommodating with poorly written code as desktop browsers, either. Anything you can do to make your code solid and standards compliant is good, for both your users and the bots.

To see where your code stands, use a good mark-up code validator. Most good development environments will offer either a built-in validator or references to such tools online. A particularly detailed validator is the W3C Markup Validation Service, a free, online HTML validator from the folks who bring you the coding language standards. It doesn’t validate entire websites recursively, just one page at a time, but it is still a very good source for detecting and identifying the issues behind coding errors.

Examine the results of the validator scan. What did it find? Check to see if you have some of these more common coding problems in your pages:

  • Does your file contain a document type declaration (DTD) statement? (It’s not absolutely required for early versions of HTML, but you’ll need to have it for XHTML documents.) And remember to use the correct coding practices for which type of document you specify—the requirements for XHTML versus HTML are similar, but not identical.
  • Are all of your tags closed properly? All of your paired tags must have corresponding openers and closers. The paragraph tag, <p>, for example, is one whose closing tags is often omitted. And if you are using XHTML, are you closing single (aka empty) tags correctly? Empty tags, like Break, need to include a forward slash before the closing greater-than sign, as in <br />.
  • Are your tags written in lower case letters? It’s not required for HTML, but it is for XHTML, so it’s now considered a best practice.
  • Are all of the tag attribute values, even numerals, as in <table border=”1″>, enclosed in quotes? While this is not required for earlier versions of HTML, it’s certainly a best practice and is a requirement in XHTML for creating well-formed code.
  • Are the tag attributes used in your code valid? HTML has changed over the years, and some attributes have been deprecated with the introductions of the latest specifications of HTML 4.01 and XHTML 1.1. An example of this is <table align=”left”>, where the standards dictate that the newer style attribute should be used now instead of align. If you are unsure, check with a reliable mark-up tag reference, such as the Tag Reference for HTML/XHTML.
  • Are you using deprecated tags? Again, changes to the standards have seen some tags become obsolete. For example, the <u></u> tags for underlining text was deprecated in HTML 4.01 and is not supported at all in XHTML using the Strict DTD.
  • Are your tags positioned in the right place in your code? For example, <meta> tags can only be used within the <head> tag. Make sure you are placing your tags correctly.
  • Are your tags nested correctly? If <tag 1> precedes <tag 2>, </tag 2> must be closed before </tag 1>. Remember this: first opened, last closed.
  • Are you using the escape special character &amp; for the ampersand character in your href attribute URL values? Some long, dynamic URLs may include the ampersand character. But to keep your code compliant, replace the ampersand references in URLs with its associated escape character code.

Tip: Test your pages in multiple browsers. One may be far more tolerant than another, and you really need to accommodate the least tolerant browser to allow the highest portion of your site’s visitors to have a good experience.

Bad links

A bad link encompasses more than mere mistyped or expired URLs. It also includes structural and site design problems that may not break a site, but can prevent it from reaching its full potential. But bad links are a very common problem, if only because the target of the link is typically outside of the linking webmaster’s control (at least external ones!). The webmasters of the sites you linked to usually fail to courteously let the rest of the known universe know when they’ve changed the URL of their pages! OK, so that was a bit facetious (as if that would be feasible, if even possible, to do!), but the problem remains. Pages you link to are regularly moved, deleted, or renamed without your knowledge, and those who tried to link to them look like a fool. As if that were the only repercussion. Broken links are bad form for search engine optimization (SEO), which means this error can eventually affect your page’s rank.

You need to regularly run a link checker as part of your site maintenance work. There are a lot of tools out there on the Web to do this work, but some only check one link at a time. That’s fine if you have a three-page site and only six outbound links! Besides, you could easily do that by clicking through your site in a browser! You need a tool the scan all the pages on your site in one fell swoop and give a consolidated report on the findings. Many mark-up language development environments offer tools for this, so check there first. I also recommend that you look at the webmaster tools offered by the search engines. Bing’s Webmaster Center offers a Crawl Issues tool that provides feedback on detected broken links. Use the File Not Found (404) filter to get a report on broken links. Also note that Google and Yahoo! also offer their own version of webmaster tools with similar functionality.

The following is a list of link issues to look for on your site:

  • File Not Found 404 errors. Error 404 is an HTML error that appears when you click a link or enter a URL in your browser that has no corresponding page. It could be due to a typo in the URL address (either keyed into the browser’s address bar or in the anchor tag’s href attribute value), or the page itself could be missing. Find all of the outbound 404s in your site and figure out what is wrong with each . You might have to go to the linked website’s home page and work through the site’s navigation to find the correct page, if it still exists. Of course, you can also get 404 errors in your own site’s internal navigation (let’s hope that didn’t happen!).
  • Missing page elements. Your internal navigation may go to the correct page, but some of the linked page’s elements may not appear, such as images, animations, scripts, or other externally-referenced files. Check the links for those elements, and if possible, use absolute links (which include full URLs) rather than the shorter, more convenient, but less reliable relative references (links addressing the file relative to the calling file’s location in the site’s directory structure rather than the full path of an absolute URL).
  • Mixed canonicalized links. We’re discussed the value of canonicalization in this blog before. And while this is not an error that will stop the bot in its tracks, it can negatively affect the ranking of your own pages if you regularly use multiple URL variations for the same page. Choose the canonical URL for a page, use it consistently in your internal linking, and consolidate your flow of link juice to that single URL. Now that’s what I’m talking about.
  • Navigation blockages. Using script for your internal site navigation may work just fine in your browser, but it’ll stop a bot dead in its tracks. Just don’t do it! Use anchor tags with keywords in the anchor text and the bot will be happy.

Tip: By the way, 404 errors are not limited to your outbound links. Other webmasters might incorrectly code or not keep up with changes to your site, resulting in 404s for users who want to see your content. Help keep those folks on your site by creating informative and useful, custom 404 pages. For more on this, see our previous discussion on this issue in Site Architecture and SEO – file/page issues.

Other coding errors

There are other coding errors or omissions that can adversely affect the way the bot collects information about a site. Let’s cover these as well.

  • Missing, empty, or duplicate <title> tag
  • Missing, empty, or duplicate <meta> description tag
  • Missing, empty, or duplicate <h1> tag

These tags are key locations in the page for using keywords and key phrases to associate for relevance with your content. The <title> tag is required in HTML and XHTML documents, and the other two might as well be (all of them are very strategic for SEO). Each one is to be used only once per page, and all must have text (no images or just blank spaces!) between the tags. That text in those locations is considered important keyword text by the bot (for it defines the content of the page), so make the most of it. Don’t duplicate text strings between these tags, either. That’s a wasted opportunity for defining more keywords.

The last issue I want to mention is the use of 302 redirects. We talked about 301 redirects before, and how to strategically use them. 302s are only temporary redirects, and unlike with 301s, no link juice credit is passed to the redirected page. Using a 302 redirect is not a coding error per se, but much of the time it is a strategic error from the perspective of SEO. Unless you have a genuinely temporary need to redirect a page, stick with 301s as an SEO best practice.

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Until next time…

— Rick DeJarnette, Bing Webmaster Center