Site architecture and SEO – file/page issues (SEM 101)

Search engine optimization (SEO) has three fundamental pillars upon which successful optimization campaigns are run. Like a three-legged stool, take one away, and the whole thing fails to work. The SEO pillars include: content (which we initially discussed in Are you content with your content?), links (which we covered in Links: the good, the bad, and the ugly, Part 1 and Part 2), and last but not least, site architecture. You can have great content and a plethora of high quality inbound links from authority sites, but if your site’s structure is flawed or broken, then it will still not achieve the optimal page rank you desire from search engines.

The search engine web crawler (also known as a robot or, more simply, a bot) is the key to website architecture issues (Bing uses MSNBot). Think of the bot as a headless web browser, one that does not display what it sees, but instead interprets the HTML code it finds on a webpage and sends the content it discovers back to the search engine database so that it can be analyzed and indexed. You can even equate the bot to a very simple user. If you target your site’s content to be readable by that simple user (serving as a lowest common denominator), then more sophisticated users (running browsers like Internet Explorer 8 or Firefox 3) will most certainly keep up. Using that analogy, doing SEO for the bot is very much a usability effort.

If you care about your website being found in search (and I presume you do if you’re reading this column!), you’ll want to help the crawler do its job. Or at the very minimum, you should remove any obstacles under your control that can get in its way. The more efficiently the search engine bot crawls your site, the higher the likelihood that more of its content that will end up in the index. And that, my friend, is how you show up in the search engine results pages (SERPs).

With site architecture issues for SEO, there’s a ton of material to cover. So much so, in fact, that I need to break up this subject into a multi-part series of blog posts. I’ve broken them down into subsets of issues that pertain to: HTML files (pages), URLs and links, and on-page content. I even plan a special post devoted solely to <head> tag optimizations for SEO.

So let’s kick off this multi-part series of posts with a look at SEO site architecture issues and solutions related to files and pages.

Use descriptive file and directory names

Always keep the user experience in mind when developing your website and doing SEO reviews. For example, every time you can use descriptive text to help represent your content, the better off your site will be for the user experience. This even goes for file and directory names. Besides being far more user friendly for end users to remember, the strategic use of keywords in file and directory names will further reinforce their relevance to those pages, to both the user and thus to the bot.

And while you’re examining the names of files and directories, avoid using underscores as word separators. Use hyphens instead. This syntax will help the bot to properly parse the long name you use into individual words instead of having it treated as the equivalent of a meaningless superlongkeyword.

Limit directory depth

Bots don’t crawl endlessly, searching every possible nook and cranny of every website (unless you are an important authority site, where it may probe deeper than usual). For the rest of us, though, creating a deep directory structure will likely mean the bot never gets to your deepest content. To alleviate this possibility, make your site’s directory structure shallow, no deeper than four child directories from the root.

Externalize on-page JavaScript and CSS code

If your pages use JavaScript and or Cascading Style Sheets (CSS), it is a good idea to save that content externally from your individual HTML pages. In the past, it used to be a concern that bots wanted to see the <body> tag content as quickly as possible due to file size crawling limitations. The Bing bot is much smarter than that today, so this is not a factor in deciding to externalize such content.

However, if each of your pages are filled with redundant and sizable script and CSS code, you can unnecessarily drag down the performance of your website with page load latency. This can be an issue for SEO domestically, and can be of particular concern for transcontinental web connections. Perhaps more importantly , you also pay more for bandwidth usage from your host provider with every unnecessary code-laden webpage fed upstream.

Removing JavaScript and CSS code from your pages into external files offers additional advantages beyond just shortening your webpage files. By being external to the content they modify, they can be used by multiple pages simultaneously. Externalizing this content also simplifies code maintenance issues.

Follow these examples on how to reference external JavaScript and CSS code in your HTML pages.

A few notes to consider. External file references are not supported in really old browser versions, such as Netscape Navigator 2.x and Microsoft Internet Explorer 3.x. But if the users of such old browsers are not your target audience, the benefits of externalizing this code will far outweigh that potential audience loss. I also recommend storing your external code files separately from your HTML code, such as in /Scripts and /CSS directories. This helps keep website elements organized, and you can then easily use your robots.txt file to block bot access to all of your code files (after all, sometimes scripts handle business confidential data, so preventing the indexing of those files might be a wise idea!).

Use 301 redirects for moved pages

When you move your site to a new domain or change folder and/or file names within your site, don’t lose all of your previously earned site ranking “link juice.” Search engines are quite literal in that the same pages on different domains or the same content using different file names are regarded as duplicates. Search engines also attribute rank to pages. Search engines have no way of knowing when you intend new page URLs to be considered updates of your old page URLs. So what do you do? Use an automatic redirect to manage this for you.

Automatic redirects are set up on your web server. If you don’t have direct access to your web server, ask your administrators to set this up for you. Otherwise, you’ll need to do a bit of research. First you need to know which type of HTML redirect code you need. Unless your move is very temporary (in which case you’ll want to use a 302 redirect), use a 301 redirect for permanently moved pages. A 301 tells the search engine that the page has moved to a new location and that the new page is not a duplicate of the old page, but instead IS the old page at a new location. Thus when the bot attempts to crawl your old page location, it’ll be redirected to the new location, gather the new page’s content , and apply any and all changes made to the existing page rank standing.

To learn how to set this up, you’ll first need to know which web server software is running your site. Once you know that, click either Windows Server Internet Information Server (IIS) or Apache HTTP Server to learn how you can set up 301 redirects on your website.

Avoid JavaScript or meta refresh redirects

Technically you can also do page redirects with JavaScript or <meta> “refresh” tags. However, these are not recommended methods of accomplishing this task and still achieving optimal SEO results. These methods were highly abused in the past for hijacking users away from content that they wanted to web spam that they didn’t want. As a result, search engines take a dim view of these techniques for redirect. To do the job right, to preserve your link juice, and to continue your good standing with search engines, use 301 redirects instead.

Implement custom 404 pages

When a user makes a mistake then typing your URL into the address bar of their browser or an inbound link contains a typo, the typical website pops up a generic HTML 404 File Not Found error page. The most common end user response to that error message is to abandon that webpage. If that user had gone to your website and despite the error, you actually had the information they were seeking, that’s a lost business opportunity.

Instead of letting users go away thinking your site is broken, make an attempt to help them find what they want by showing a custom 404 page. Your page should look like the other page designs on your site, include an acknowledgment that the page the user was looking for doesn’t exist, and offer a link to your site’s home page and more importantly, access to either a site-wide search or an HTML-based sitemap page. At a minimum, make sure your site’s navigation tools are present, enabling the user to search for their content of interest before they leave.

Implementing a custom 404 page is dependent upon which web server you are using: For users of Windows Server IIS, check out the new Bing Web Page Error Toolkit. Otherwise, browse the 404 information for Apache HTTP Server.

Other crawler traps

The search engine bot doesn’t see the Web as do you and I. As such, there are several other page-related issues that can “trap” the bot, preventing it from seeing all of the content you intend to have indexed. For example, there are many page types that the bot doesn’t handle very well. If you use frames on your website (does anyone still use frames?), the bot will only see the frame page elements as individual pages. Thus, when it want to see how each page interrelates to other pages on your site, frame element pages are usually poor performers. This is because frame pages usually separate content from navigation. Thus content pages often become islands of isolated text that are not linked to directly by anything. And with no links to them, they might never get found. But even if the bot finds the frame’s navigation pane page, there’s no context to the links. This is pretty bad in terms of ranking in search engine relevance.

Other types of pages that can trip up search engine bots include forms (there’s typically no useful content on a form page) and authentication pages (bots can’t execute authentication schemes, so they are blocked from seeing all of the pages behind the authentication gateway). Pages that require either session IDs or cookies to be accessed are similar to authentication pages in that the bot’s inability to generate session IDs or accept cookies block them from accessing content requiring such tracking measures.

To keep the search engine bot from going places that might trip it up, see the following information about the “noindex” attribute to prevent indexing of whole pages.

We’re only getting started here on site architecture issues. There’s plenty more to come. If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. See you soon…

– Rick DeJarnette, Bing Webmaster Center

Join the conversation

69 comments
  1. Anonymous

    Nice

  2. Anonymous

    How deep is too deep for "directory depth" for Bing?

  3. bagtrend

    how to use this?

  4. jackin

    What a great article fo seo, it contains the lot of tips and if follow all than that will increase your site looks, increase SERP's ,traffic etc. Awesome1

  5. Quality Directory

    To me SEO starts with search engine optimized web design, followed by quality and unique content.

  6. Anonymous

    Great works about SEO & Site Architecture for all class of Webmasters & SEOs. Bing are an example of a new class of Search Engine for the upcoming new times.

  7. Anonymous

    Great info. Thanks for this! I'm going to share this.

  8. Anonymous

    Nice collection of common things the SEO should take care of.

  9. Anonymous

    @Rafael

    "To alleviate this possibility, make your site’s directory structure shallow, no deeper than four child directories from the root."

  10. Anonymous

    very nice useful easly understandable article

  11. Anonymous

    thank you somuch verygood

  12. Anonymous

    I am creating a page now and this information is fantastic. I will try to incorporate everything you stated into my site's architecture to see if I can get it to page one.

  13. nexcheck

    I am  using 301 redirect for my non www url to www url in the site http://www.pcsecurityworld.com , still bing lists it without www unfortunately.

  14. mauco

    Very interesting and educative. I love the illustrations you used to drive home the points. Thanks.

  15. Anonymous

    Hi. Very often question is meta description and keywords. Does Bing accept this meta headers? Is better to use this headers? IMHO meta keywords is not useful, but meta description is speculative for most SEO consultants. Somebody says it is very useful, somebody it does not matter. Work with filling these headers isn't easy.

  16. navneet kumar

    Nice tips for search engine optimization specially for bing

  17. Anonymous

    Bing seems to hold onto the best of both worlds within searches. I Like it so far, but know doubt more improvements are on the way. Are we seeing the start of a serious challenge to googles empire?

    Good tips btw

  18. Anonymous

    Thanks for the seo tips

  19. b1mou100

    What are the main changes brought by Bing ?

    All I've read was the rules for Livesearch.

  20. Anonymous

    I think using metatag is necessary compare to keyword because,some time bots only search on the based of keywords so that time keyword metatag would be helpful.

  21. Anonymous

    Thanks for helping and educating

  22. muffieshannen

    i read your article and i found such great tips. eventhough im a neophyte in blog and seo thing such as bot etc.. i think your advice is one of that i read regarding search engine optimization..

  23. Anonymous

    How about the page loading time? Does it affect the SERP?

  24. Anonymous

    Bing is easy to read and write.

    Threat to google as a compititor.

    Bing has to write more whitepaper about its SEO strategies.

    Thank you.

    http://www.dmaxonline.com

  25. Anonymous

    Finally… Some excellent seo material from an authoritative source.

    Good work and I really like the matter of fact points you have been so kind enough to share with us.

  26. tobicomvn

    Thanks for this! Good info

  27. Anonymous

    The information on cutting File Size really valuable for me.

  28. Anonymous

    A lot of information I already implement into my sites. Good thing about the 4 deep structure is that WordPress default permalinks usually go domain.com/year/month/postname/ which falls into that criteria. I do find bing sometimes doesn't go so far as to the postname and just to the year/month section (2 deep).

  29. luke.sarker

    Nice article but need to be parted, otherwise it is boreing to read such a large one.

  30. muffieshannen

    Limit physical page file size

    Keep your individual webpage files down under 150 KB each. Anything bigger than that and the bot may abandon the page after a partial crawl or skip crawling the page entirely

    What if I had posted a large quantity of pictures but really essential on my article. Will the bot abandon  or skip crawling?

    Thanking you in advance..

  31. Anonymous

    Very usefull

  32. daniel.bomgardner

    Great blog.  a lot to digest though!

  33. Anonymous

    ….and authentication pages (bots can’t execute authentication schemes, so they are blocked from seeing all of the pages behind the authentication gateway)….

    Google allows you to register a special login account so that it can see and index protected content. This then forces a searcher to signup at your site to see the information.

  34. Anonymous

    Very nice and informative.Really provided the very great information.

  35. Anonymous

    Thanks that helped a lot. I'm someone that still uses frames so I guess I'll have to find an alternative.

  36. Anonymous

    Wow great idea!

  37. rickdej

    Hey folks, I got some late feedback from some internal folks on this topic and I made some minor modifications to this article. You might want to go back and reread it just to be sure you have the latest and greatest info! Thanks!

    Rick DeJarnette

    Bing Webmaster Center Team

  38. Lucca Sofiato

    Very useful!

    Thanks

  39. panther45

    Quite helpful. Two things I didn't know where the hyphen and the deep directory depth. Although they aren't a problem on my site.

  40. Anonymous

    Properly optimizing your pages to make them “search engine friendly” can greatly increase your search engine rankings, traffic levels, and potential earnings from your website.

  41. yuanxiangjin

    nice ~~thank you so much !

  42. Anonymous

    nice ~~thank you so much !

  43. pdfee

    cool illustrations i love it

  44. Answer Blip Trivia

    The one thing is that Meta Refresh tags can be used in some cases, just make sure you keep the time under 1 as anything over that the engines see as a 302 redirect and could cause some issues.

  45. Anonymous

    I thought this article was very good, informative and easy to understand. Hopefully this works for my home business site I am working on.

  46. s2_krish

    Hi this is nice post. I like idea about 301 moved pages. But I have different pages. I have moved my website, now couple pages has been removed. They do not exist in my current website. I see lot of 404 error in analytic report. Getting lot of 404 pages to bot may not sound good. I can't use 301 redirect method also because those pages are removed, not moved. Now, how can solve this issue.

  47. Anonymous

    Bing is a professional and seo friendly search engine..

  48. Blackpool UK

    i use 301 on my non www to my www domain for website.

  49. hotels

    great seo tips and advice from bing

  50. Anonymous

    Nice collection ….

  51. Anonymous

    I am  using 301 redirect for my non www url to www url in the site http://www.sohbetw.com , still bing lists it without www unfortunately.

  52. Anonymous

    Excellent post! Very informative. You're right. There is a lot to cover to make the on page elements correct. In fact I do that on a daily basis for my seo clients. It's not just something you can have some one do in 5 minutes. It takes time and testing and there are many different elements that go into the on page optimization. But you have to do this before you drive any traffic to your site using off page link building and promotional activates. I appreciate you great post. Thanks

  53. corbax

    This site is very interesting and brings new ideas to professionals in search engine positioning and web designers.

    Thank you

  54. Elton182

    Hmm..

  55. Anonymous

    nice optimization tips

  56. veratse

    I use 301 redirects my site. but bing search lost my wev site main page.

  57. redtubeeu

    very useful info. It's useful for all search engines

  58. Anonymous

    Very interesting and educative

  59. nighttigers1

    Hi nice and essay written article

  60. Altamiraweb

    Live is getting much better!

    Great post!

    I´ve already subscribed the feeds!

  61. chazdhigros

    good

  62. pri2sh008

    Does bing search engine follow the nofollow links ?

  63. Amcy5.com

    Please check the architecture of my site and suggest some better one at http://www.amcy5.com/

  64. tipograf ieromonah

    interesting but show us an example on how to use.

Comments are closed.