Working with large sites often means being a part of a large organization, which brings its own set of challenges. Many stakeholders with different agendas or needs influence how sites are structured. Within larger organizations, there are long to-do lists and a lack of understanding of the impact certain designs or architecture choices can have on the ability of the search engine to index the site.
In our past two articles on large site optimization, we have talked about how to help us crawl your site more efficiently and the need to reduce the number of URLs you expose. However, due to the diverse needs of some organizations, the issue isn’t that you share too many URLs. Instead, some websites hide valuable content from the search crawlers. To ensure that your site’s getting plenty of coverage, consider the following patterns and be sure to avoid them.
Just a little too flashy
While in recent months there have been some advances in crawling RIAs, these advances are limited to text extraction. The truth is that these crawling improvements only take place in certain contexts. You should still consider content built in these tools inaccessible to the crawlers.
There are some techniques you can use that help to make sure that content in Silverlight or Flash isn’t lost to crawlers or any visitors to your site.
Fundamentals, fundamentals, fundamentals
As with sports skills, no matter how good you get, you need to ensure that you have your fundamentals down. There’s no difference in the world of RIAs. Your site needs to provide:
- Unique page titles
- Detailed meta page descriptions
- Quality body copy (e.g. timely, relevant, well written)
- Descriptive H1 tags (one per page)
- Discoverable navigation
- Informative ALT tags
Practice progressive enhancement
Progressive enhancement is a strategy for ensuring that your content is accessible to all viewers by designing for the least capable devices first, and then enhance those documents with separate logic for presentation. The basic concept of progressive enhancement can be summed up in the following principles:
- Basic content should be accessible to all users
- Basic functionality should be accessible to all browsers
- All content is in hierarchical semantic markup (<title>, <h1>, <h2>, <h3>, etc.)
- Enhanced layout (Silverlight or Flash files) is provided separately
- End user browser preferences are respected
One way to implement progressive enhancement is to ensure your rich content is embedded in <div> or <span> tags. These tags should contain alternate content including the links, text, and images found in your RIA.
For a complete picture on progressive enhancement, you can read A List Apart’s article Understanding progressive enhancement.
Writing your AJAX application for search engines
These exchanges of data require the execution of scripts when particular page events occur. Search engines don’t interpret or execute this kind of code, Therefore, any content contained within an AJAX-enabled page or control isn’t accessible and won’t be indexed.
If some pages or your entire site will leverage AJAX, you should consider the following strategies for generating as much static content as possible:
- Use a descriptive title tag, meta keywords, meta description, and H1 tag on all pages to identify its intended content, even if the content on the page itself isn’t immediately accessible.
- If your RIA requires extensive AJAX use, plan for development of automated, static page builds that are accessible via static links from dynamic pages. Some people refer to these as “lo-fi” content links.
- Where possible, use server-side technologies instead of client-side browser requests for populating individual, dynamic content containers.
Playing hide and seek
For large sites, there are probably other reasons why valuable content may not be accessible to crawlers. As the owner of a large site, it’s important from time to time to audit your robots.txt file to ensure that you aren’t blocking important sections of your site, and ensure that your sitemaps are both up-to-date and enabling access to anything that may be of value to the searcher.
In our final installment in this series, we will discuss some of the finer points of content for big sites. If you have additional questions, feel free to ask in our forums.
Jeremiah Andrick – Program Manager, Live Search Webmaster