Optimizing your very large site for Search — Part 3

Working with large sites often means being a part of a large organization, which brings its own set of challenges. Many stakeholders with different agendas or needs influence how sites are structured. Within larger organizations, there are long to-do lists and a lack of understanding of the impact certain designs or architecture choices can have on the ability of the search engine to index the site.

In our past two articles on large site optimization, we have talked about how to help us crawl your site more efficiently and the need to reduce the number of URLs you expose. However, due to the diverse needs of some organizations, the issue isn’t that you share too many URLs. Instead, some websites hide valuable content from the search crawlers. To ensure that your site’s getting plenty of coverage, consider the following patterns and be sure to avoid them.

 

Just a little too flashy

 

Using Silverlight or Flash on pages within your site can help to create rich and interactive user experiences that can complement or build on the content in the rest of your site. On the other hand, Rich Internet Applications (RIAs) can also hinder search engine discoverability of your content. Such content is typically not fully exposed to a search engine crawler or a browser that doesn’t execute JavaScript.

While in recent months there have been some advances in crawling RIAs, these advances are limited to text extraction. The truth is that these crawling improvements only take place in certain contexts. You should still consider content built in these tools inaccessible to the crawlers.

There are some techniques you can use that help to make sure that content in Silverlight or Flash isn’t lost to crawlers or any visitors to your site.

 

Fundamentals, fundamentals, fundamentals

 

As with sports skills, no matter how good you get, you need to ensure that you have your fundamentals down. There’s no difference in the world of RIAs. Your site needs to provide:

  • Unique page titles
  • Detailed meta page descriptions
  • Quality body copy (e.g. timely, relevant, well written)
  • Descriptive H1 tags (one per page)
  • Discoverable navigation
  • Informative ALT tags

 

Practice progressive enhancement

 

Progressive enhancement is a strategy for ensuring that your content is accessible to all viewers by designing for the least capable devices first, and then enhance those documents with separate logic for presentation. The basic concept of progressive enhancement can be summed up in the following principles:

  • Basic content should be accessible to all users
  • Basic functionality should be accessible to all browsers
  • All content is in hierarchical semantic markup (<title>, <h1>, <h2>, <h3>, etc.)
  • Enhanced layout (Silverlight or Flash files) is provided separately
  • Enhanced behavior is provided by externally-linked JavaScript
  • End user browser preferences are respected

One way to implement progressive enhancement is to ensure your rich content is embedded in <div> or <span> tags. These tags should contain alternate content including the links, text, and images found in your RIA.

The alternate content may contain links, heading, styled text, and images — anything you can add to an ordinary HTML page. You can use JavaScript to detect if the browser supports Flash or Silverlight. If so, the JavaScript manipulates the page’s document object model (DOM) to replace the alternate content with the Flash or Silverlight application. The key is to ensure that the alternate content accurately reflects the contents of the RIA file or you may be penalized.

For a complete picture on progressive enhancement, you can read A List Apart’s article Understanding progressive enhancement.

 

Writing your AJAX application for search engines

 

In the past few years, AJAX (shorthand for Asynchronous JavaScript and XML) has become popular with web developers looking to create a more dynamic and responsive experiences by exchanging small amounts of data with the server behind the scenes. Often AJAX is used as a technique for creating interactive web applications, but more than ever some people are using AJAX to spice up their pages.

These exchanges of data require the execution of scripts when particular page events occur. Search engines don’t interpret or execute this kind of code, Therefore, any content contained within an AJAX-enabled page or control isn’t accessible and won’t be indexed.

If some pages or your entire site will leverage AJAX, you should consider the following strategies for generating as much static content as possible:

  • Use a descriptive title tag, meta keywords, meta description, and H1 tag on all pages to identify its intended content, even if the content on the page itself isn’t immediately accessible.
  • Don’t put your site navigation within an AJAX/JavaScript container.
  • If your RIA requires extensive AJAX use, plan for development of automated, static page builds that are accessible via static links from dynamic pages. Some people refer to these as “lo-fi” content links.
  • Where possible, use server-side technologies instead of client-side browser requests for populating individual, dynamic content containers.

 

Possible pattern

 

If your site or a section of your site requires Silverlight, Flash, or AJAX, you may want to review Nikhil Kothar’s pattern for making RIAs more indexable at SEO for Ajax and Silverlight Applications. No matter the technology, you can build a simple container that will keep the page search engine friendly without trying to detect a search engine crawler on the server. The container Nikhil proposes is a simple bit of coding that uses JavaScript to load the enhanced content but has a separate container with content to display if the enhance content doesn’t load. The markup will look something like:

<div style="width:0px;height:0px;overflow:hidden;">
   <!-- Object -->
   <object >
    ...
   </object>
   <!-- or JavaScript generated content -->
</div>
   <script type="text/javascript">
   document.write('<div style="display: none">');
   </script>
<div>
   Text for browser having JavaScript turned off and search engines.
</div>
   <script type="text/javascript">
   document.write('</div>');
</script>
</div>

 

Playing hide and seek

 

For large sites, there are probably other reasons why valuable content may not be accessible to crawlers. As the owner of a large site, it’s important from time to time to audit your robots.txt file to ensure that you aren’t blocking important sections of your site, and ensure that your sitemaps are both up-to-date and enabling access to anything that may be of value to the searcher.

 

Coming up

 

In our final installment in this series, we will discuss some of the finer points of content for big sites. If you have additional questions, feel free to ask in our forums.

Jeremiah Andrick – Program Manager, Live Search Webmaster

Join the conversation

14 comments
  1. Anonymous

    easy to say, hard to do…but true!

  2. Anonymous

    Great tips! I’m not sure why I’d never thought of the javascript document.write trick myself, but I’ll definitely be using that in my next design. These three posts are full of excellent information, and I strongly urge anyone new to SEO to heed their recommendations.

    Thanks for taking the time to write these!

  3. Anonymous

    Thanks, hope can get to the bottom of my site not doing well one live.

  4. rickdej

    Garrett,

    Thanks for the feedback we are trying to ensure that we giving you tips that are actionable or at least give some insight into what matters.

    Thanks

    Jeremiah

  5. Anonymous

    or you could possibly set the onclick event to call js and put href="/seopage" for the purpose of search engine crawler?

  6. rickdej

    @guest I am not sure that would do the trick and may not be viewed the same way as Nikhil’s pattern.

    Jeremiah Andrick

  7. Quality Directory

    I optimize my site the same way for all search engines and I don't use Rich Internet Applications.

  8. Anonymous

    Hope to be better. Better means more features.

  9. Anonymous

    This is great news. Best of luck for the future and keep up the good work.

  10. darrentrav

    Great article. Does anyone have any tips about making flash content accessible?

  11. miles2go

    The follower of this blog getting the great lesson. Thanks

  12. Anonymous

    This is great news. Best of luck for the future and keep up the good work.

  13. novintabligh

    thanks for this great post. it is very good, though it is hard to implement all these on a large site.

  14. Lemon Slice Travels

    nice post.. thank you

Comments are closed.