Office.com SEO: search engine-friendly URLs

Editor’s note: In our continuous effort to make this blog as compelling as possible to our large and diverse audience, we are expanding the scope of the Bing Webmaster Center blog. Starting with this post, we will host occasional posts from “guest bloggers” from within Microsoft who work on search or use search-related technologies in their daily jobs. They will offer the perspective of a user of search engine optimization (SEO) services (just like you!) rather than that of a search engine offering prescriptive SEO advice. Let us know what you think and what topics you’d like to see covered in future posts with a comment here. Thanks for being a member of the Bing Webmaster Center community!

Today’s guest blogger is Vincent Wehren, who led the SEO effort for the new Office.com. Office.com has grown to become the 30th largest site in the world, and has tens of millions of pages of indexed content. He’s been an SEO for three years and leads the International team responsible for improving content optimization, reducing content duplication in the index, and optimizing site search performance, among many other duties. I am very pleased to introduce him to the SEO community.

— Rick DeJarnette, Bing Webmaster Center team

***************************************************

Office.com is the companion website to Microsoft Office. With over 200 million unique visitors per month, 6 million content pages in 38 languages, and roughly 500 community contributors, the site offers product and support info as well as productivity content such as templates, images, clip art, and add-ons.

As part of building Office 2010, Office.com went through a complete redesign. In addition, the content management system and the server infrastructure behind it were also rebuilt to run on top of SharePoint 2010. As a side benefit, this major revamp also provided us with some opportunities to improve on our SEO capabilities.

Over a couple posts, I would like to share some of the SEO challenges we faced at Office.com and some of the decisions we made in the hope that you will find it useful for your site—big or small.

A new URL structure for our pages

Backed by the recommendations that came out of a site review that we did in collaboration with an external SEO vendor, we defined a global SEO strategy and list of priority items to go after. Our core focus during the development phase was on site architecture and other items that required code work to be done by our team of developers.

One of our top priorities was improving our URL structure to become more search engine-friendly. We already had a relatively flat folder structure—never more than one/two folder levels deep—but the URLs of our pages only contained a cryptic document ID, which made sense to our internal content management systems (CMS) but not to search engines (or users, for that matter). So, as part of the redesign, we wanted to have support for keyword-based, search-engine friendly URLs.

The motivation behind this is fairly straightforward:

  • People often copy and paste URLs verbatim into their blogs, forum comments, or web pages instead of using text.
  • At that point, the URL text becomes the anchor text for the link.
  • Anchor text is evaluated by the search engines to tell them more about the page the link is pointing to and keyword-focused.
  • Anchor text of inbound links is generally regarded as a top SEO ranking factor.
  • So, if all you have is a cryptic URL, this isn’t going to add any “keyword power” to your page, but using a keyword-based URL will give you the additional “keyword power” you need to help your page rank for the included terms.

But that’s not really all:

  • The URL itself is also very likely a part of the search engine’s ranking equation, so having meaningful keyword-focused text helps with this too.
  • Finally, if the URL matches the search terms for a given query, that part of the URL will be bolded in the results page, which can help increase click-though and traffic to your page.

The solution

With a need to scale for hundreds of thousands of articles and a large number of languages, we decided to simply re-use the existing page title and algorithmically build the display URL. We created something that loosely works as follows and which doesn’t differ a whole lot from what some other content management systems or blogging software solutions do:

  1. Start with the document title.
  2. Replace spaces and other non-boundary tokens (such as apostrophes, underscores, etc.) with hyphens.
  3. Normalize any accented/extended characters to plain ASCII letters.
  4. Make everything lowercase.
  5. Append the internal document ID to always ensure a unique URL regardless of title.

For example, following the above rules, the article “Overview of XML in Excel” in English now can be found at http://office.microsoft.com/en-us/excel-help/overview-of-xml-in-excel-HA010206396.aspx.

On the other hand, our users in Mexico will find the article here: http://office.microsoft.com/es-mx/excel-help/informacion-general-sobre-xml-en-excel-HA010206396.aspx.

URL length and stop words

What we did not implement for Office.com but you may want to consider for your situation is to limit the number of keywords in the URL or remove stop words from it.

The argument is that too many keywords dilute the value of each individual keyword and that long URLs receive fewer click-throughs.

We explicitly did not remove stop words because this gets a lot more involved for the large number of languages we support. Also, a lot of our pages are around key terms that in other contexts would qualify as stop words. A good example would be the title such as “What-if scenario” or “If function” in Excel, where the stop words “what” and “if” are actually the most significant, so stripping them out simply didn’t make sense for us.

Also, search engines have started to improve the way they surface the page URL in the search results, making the click-through argument somewhat less of a concern.

Exceptions to our keyword-based URL strategy

There are cases where we wanted to cement the folder (or what we call a “sub web”) name as the ultimate display URL for the page. In those cases we do not expand the page title but just promote the folder as the canonical URL. An example would be the default page of a specific product subfolder such as http://office.microsoft.com/en-us/access-help/. This also has the advantage that if the document ID changes for the index page of this folder, we do not have to redirect from the old page to the new page which we had to do in the past.

We also didn’t end up taking the keyword-based approach for non-Latin based character sets, such as for our Japanese, Russian, Arabic, or Hindi sites—not because this wasn’t feasible technically, but mostly because of the fact that there was still sufficient ambiguity around how to best handle URLs in these languages for users, browsers, and search engines alike. However, this is definitely something we would like to explore further in the future.

Fewer URL parameters

In addition to the keyword-based URLs, there was also a push to reduce the use of query parameters and have our URLs be more static overall. Although we didn’t manage to remove all dynamic parameters (some of them are still meaningful, as with some click tracking scenarios), we made huge strides in that direction. Not only does that make it easier for search engines to determine the “primary” URL for a resource (there should preferably only ever be one), but it also helped to reduce the URL surface which search engines have to spend time crawling, processing, de-duping, etc., allowing them to spend more time on other pages.

Redirection of the old-style URLs to new URLs and the canonical tag

When making large-scale URL changes on a site that has earned numerous inbound links in the wild, you should redirect the old URLs to the new ones using a 301 redirect.

The 301 redirect makes sure that all ranking power of the old link is concentrated in the new URL. It also helps avoid content duplication problems if both the old and new URL still “work”—which is the case for Office.com.

In addition, you could consider backing up this redirect strategy with the rel=”canonical” tag, which is starting to enjoy more and more support from the search engines. The canonical tag tells search engines the preferred URL of the page if there are multiple URLs for the page.

For Office.com, we planned to use both 301 redirects and the canonical tag, although we will start doing the full redirection only in a few weeks. Also, we are exclusively advertising the new URLs in our XML Sitemaps—but more about our Sitemap strategy in a later post!

What have you planned for? Are you thinking about search engine, keyword-based friendly URLs for your site?

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Up next: Office.com Sitemaps strategy.

— Vincent Wehren, Lead Engineer, Office.com International Site & Services