Office.com SEO: search engine-friendly URLs

Editor’s note: In our continuous effort to make this blog as compelling as possible to our large and diverse audience, we are expanding the scope of the Bing Webmaster Center blog. Starting with this post, we will host occasional posts from “guest bloggers” from within Microsoft who work on search or use search-related technologies in their daily jobs. They will offer the perspective of a user of search engine optimization (SEO) services (just like you!) rather than that of a search engine offering prescriptive SEO advice. Let us know what you think and what topics you’d like to see covered in future posts with a comment here. Thanks for being a member of the Bing Webmaster Center community!

Today’s guest blogger is Vincent Wehren, who led the SEO effort for the new Office.com. Office.com has grown to become the 30th largest site in the world, and has tens of millions of pages of indexed content. He’s been an SEO for three years and leads the International team responsible for improving content optimization, reducing content duplication in the index, and optimizing site search performance, among many other duties. I am very pleased to introduce him to the SEO community.

– Rick DeJarnette, Bing Webmaster Center team

***************************************************

Office.com is the companion website to Microsoft Office. With over 200 million unique visitors per month, 6 million content pages in 38 languages, and roughly 500 community contributors, the site offers product and support info as well as productivity content such as templates, images, clip art, and add-ons.

As part of building Office 2010, Office.com went through a complete redesign. In addition, the content management system and the server infrastructure behind it were also rebuilt to run on top of SharePoint 2010. As a side benefit, this major revamp also provided us with some opportunities to improve on our SEO capabilities.

Over a couple posts, I would like to share some of the SEO challenges we faced at Office.com and some of the decisions we made in the hope that you will find it useful for your site—big or small.

A new URL structure for our pages

Backed by the recommendations that came out of a site review that we did in collaboration with an external SEO vendor, we defined a global SEO strategy and list of priority items to go after. Our core focus during the development phase was on site architecture and other items that required code work to be done by our team of developers.

One of our top priorities was improving our URL structure to become more search engine-friendly. We already had a relatively flat folder structure—never more than one/two folder levels deep—but the URLs of our pages only contained a cryptic document ID, which made sense to our internal content management systems (CMS) but not to search engines (or users, for that matter). So, as part of the redesign, we wanted to have support for keyword-based, search-engine friendly URLs.

The motivation behind this is fairly straightforward:

  • People often copy and paste URLs verbatim into their blogs, forum comments, or web pages instead of using text.
  • At that point, the URL text becomes the anchor text for the link.
  • Anchor text is evaluated by the search engines to tell them more about the page the link is pointing to and keyword-focused.
  • Anchor text of inbound links is generally regarded as a top SEO ranking factor.
  • So, if all you have is a cryptic URL, this isn’t going to add any “keyword power” to your page, but using a keyword-based URL will give you the additional “keyword power” you need to help your page rank for the included terms.

But that’s not really all:

  • The URL itself is also very likely a part of the search engine’s ranking equation, so having meaningful keyword-focused text helps with this too.
  • Finally, if the URL matches the search terms for a given query, that part of the URL will be bolded in the results page, which can help increase click-though and traffic to your page.

The solution

With a need to scale for hundreds of thousands of articles and a large number of languages, we decided to simply re-use the existing page title and algorithmically build the display URL. We created something that loosely works as follows and which doesn’t differ a whole lot from what some other content management systems or blogging software solutions do:

  1. Start with the document title.
  2. Replace spaces and other non-boundary tokens (such as apostrophes, underscores, etc.) with hyphens.
  3. Normalize any accented/extended characters to plain ASCII letters.
  4. Make everything lowercase.
  5. Append the internal document ID to always ensure a unique URL regardless of title.

For example, following the above rules, the article “Overview of XML in Excel” in English now can be found at http://office.microsoft.com/en-us/excel-help/overview-of-xml-in-excel-HA010206396.aspx.

On the other hand, our users in Mexico will find the article here: http://office.microsoft.com/es-mx/excel-help/informacion-general-sobre-xml-en-excel-HA010206396.aspx.

URL length and stop words

What we did not implement for Office.com but you may want to consider for your situation is to limit the number of keywords in the URL or remove stop words from it.

The argument is that too many keywords dilute the value of each individual keyword and that long URLs receive fewer click-throughs.

We explicitly did not remove stop words because this gets a lot more involved for the large number of languages we support. Also, a lot of our pages are around key terms that in other contexts would qualify as stop words. A good example would be the title such as “What-if scenario” or “If function” in Excel, where the stop words “what” and “if” are actually the most significant, so stripping them out simply didn’t make sense for us.

Also, search engines have started to improve the way they surface the page URL in the search results, making the click-through argument somewhat less of a concern.

Exceptions to our keyword-based URL strategy

There are cases where we wanted to cement the folder (or what we call a “sub web”) name as the ultimate display URL for the page. In those cases we do not expand the page title but just promote the folder as the canonical URL. An example would be the default page of a specific product subfolder such as http://office.microsoft.com/en-us/access-help/. This also has the advantage that if the document ID changes for the index page of this folder, we do not have to redirect from the old page to the new page which we had to do in the past.

We also didn’t end up taking the keyword-based approach for non-Latin based character sets, such as for our Japanese, Russian, Arabic, or Hindi sites—not because this wasn’t feasible technically, but mostly because of the fact that there was still sufficient ambiguity around how to best handle URLs in these languages for users, browsers, and search engines alike. However, this is definitely something we would like to explore further in the future.

Fewer URL parameters

In addition to the keyword-based URLs, there was also a push to reduce the use of query parameters and have our URLs be more static overall. Although we didn’t manage to remove all dynamic parameters (some of them are still meaningful, as with some click tracking scenarios), we made huge strides in that direction. Not only does that make it easier for search engines to determine the “primary” URL for a resource (there should preferably only ever be one), but it also helped to reduce the URL surface which search engines have to spend time crawling, processing, de-duping, etc., allowing them to spend more time on other pages.

Redirection of the old-style URLs to new URLs and the canonical tag

When making large-scale URL changes on a site that has earned numerous inbound links in the wild, you should redirect the old URLs to the new ones using a 301 redirect.

The 301 redirect makes sure that all ranking power of the old link is concentrated in the new URL. It also helps avoid content duplication problems if both the old and new URL still “work”—which is the case for Office.com.

In addition, you could consider backing up this redirect strategy with the rel=”canonical” tag, which is starting to enjoy more and more support from the search engines. The canonical tag tells search engines the preferred URL of the page if there are multiple URLs for the page.

For Office.com, we planned to use both 301 redirects and the canonical tag, although we will start doing the full redirection only in a few weeks. Also, we are exclusively advertising the new URLs in our XML Sitemaps—but more about our Sitemap strategy in a later post!

What have you planned for? Are you thinking about search engine, keyword-based friendly URLs for your site?

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Up next: Office.com Sitemaps strategy.

– Vincent Wehren, Lead Engineer, Office.com International Site & Services

Join the conversation

35 comments
  1. towandagiannellcourd

    I believe its a nice way to create search engine friendly urls

  2. Quality Directory

    It's wise to allow professional guest bloggers to breeze in sometimes and make unbiased posts, as long as you don't open up the gate for "outsiders" to post entries.

  3. stephan.walcher

    What do you mean with "we planned to use both 301 redirects and the canonical tag, although we will start doing the full redirection only in a few weeks."?

    Do all the pages with a new url already have the new canonical tag and the you are planning the 301-redirect only in a few weeks or are you going to do both together?

  4. Uwe Keim

    I love it to read such background information, both from Microsoft/Bing and Google.

  5. Vincent Wehren

    @stephan.walcher

    We already are using the canonical tag on our pages but are redirecting only a subset of our legacy URLs to the new URLs at the moment. Later this month we will redirect all legacy URLs to the new URLs.

    We didn’t want the additional overhead of this massive amount of redirects while we switched over our traffic to the brand new server farms. Now that we’ve established we are meeting and exceeding our end-user performance goals, the team will deploy the additional redirect code.

    Although this was a temporary tradeoff from an SEO perspective, this has the benefit that it allows us to monitor the effects of this particular change to indexation, rankings, and traffic in a more isolated fashion. As a side-effect, it also provided us with some interesting data on the canonical tag: for which engines it seemed to be working well even without a full 301-redirect in place.

    Does that answer your question?

    - Vincent Wehren

  6. Allan Duncan

    how long do you think it will take to do a 301 redirect on a website like this big?

  7. bouka55

    Its a brilliant idea; having tips from someone who actually does SEO on a day to day basis. Kudos to you!

  8. simplycast

    It is great to read this post. It reassures me that we went the correct way with URLS when we had a new site update.

    SO many other tidbits in this post that will be tried out.

  9. sachinwd

    knowing about this I booked many domain.

  10. ianmacfarlane

    @Vincent – I'd be interested to see a report on how the different search engines handled the canonical tag for Office.com.

  11. umeshsmiles

    Definitely a useful guide for SEO for any site.

  12. old.cars

    Love your work Rick, I have to say that one of the best things to happen as far as I am concerned is the new developments around the search engines and the canonical tag. Switching a site to CMS would be deadly if not for the ability to use the canonical tag. I am a lone wolf and do all my own development and optimization, it took untold hours to change all my CMS URLs to SEF URLs.

    Some of my sites have thousands of pages but you will no longer find any URL that contains ID=b44eut87zipitgo, ?67, every single category, sub etc. now contain SEF URLs. This change was well worth the work involved, some of the sites moved up several pages in the search engines, and many of them that would shift from page one position to page two now hold their spots.

    Bob  

  13. caspava

    thank you nice

  14. agptek15

    It's goog information.

  15. deepakjaat

    yas this is the best information for me and this is very her-full me

    but thanks for it

  16. naitatum

    definitely a useful guide for seo for any site.thank you nice

  17. simplify3

    I think this is great advice for new webmasters.

    But converting old websites — that's tough.  There are so many places where the old urls reside — and I really would LOVE an update of how the 301 redirects work for you when you convert the old pages over.

    That's been my biggest concern.  For now, I've been really freaky about keeping my urls the same.  It's: free.naplesplus.us article view.php 12345 SEO-friendly-name

    I'd like to make it: http://www.naplesplus.com seo-friendly-name-12345 — and its really easy to set up – but I would so hate for so much work to suddenly go down the tubes due to a renaming.  If I'm going to rename, I'd want to do a complete overhaul of the site to make it 2010-2011 looking rather than 2007-looking.  I've heard so many horror stories about people following the guidelines:  websitename.com/people/firstname-lastname and things like that just to suddenly watch everything tank and not recover.

    So please – keep us informed as to how it works when you transfer over to the new naming convention.  If you get a dip, how much of a dip?  If it recovers – how long? A week? a month?  Thanks!  -Ken, Naples, Florida

  18. jollytrade01

    thanks for share!

  19. toncmi

    great serch engine in the future

  20. web12895

    Thanks for SEO guide It very useful fo any site I really very impress Thanks again for postings, from last seven days i search info related to SEO for my site http://www.usauk-classifieds.com Thanks

  21. mobiledealsstore

    Such a useful information u shared. thanks for this wonderful post.

  22. hotels

    This is a very useful seo guide, thank you.

  23. Assignment-Help

    So, at last even MS is facing this problem. What would that mean to a smaller company having big site ??? And how long you can keep 301 redirection? If forever then what was the point of url rewriting ?? thanks I learn a lot from your mistkaes.

    Elvis

    http://www.assignmenthelp.net

  24. Ghost Riders Leather

    Enjoyed reading your post.  Thanks very much for your time and effort.

  25. anoopseo

    Url wiil always be optimization …thanks for sharing it..

  26. sohel

    Thanks for sharing this with all of us. Of course, what a great site and informative posts, I will bookmark this site. keep doing your great job and always gain my support. Thank you for sharing this beautiful article

    http://bobevanscoupon.net

  27. thuandh

    i'll appy it for my web

Comments are closed.