Going international: considerations for your global website

One of the best parts of publishing online is that, on the Web, anyone can have a world-wide reach. But while being global is made easy on the Internet, ensuring that the content you produce will be found by the right audience can be a real challenge. Search engines can have trouble understanding geotargeting because of a few technical limitations. These include:

  • Search engines may not be crawling your site from the location of your customers.
  • Search engines may not execute JavaScript, which can break some targeting methods that rely on self-selection or JavaScript-based redirection.
  • There’s no standardized way to tell a search engine which region or language your content’s targeted for.
  • Content language usage can be misleading. For example, code comments and URLs could be written in English and hosted in a data center in Washington State, USA, while the main content could be written in Spanish, and the user generated content (UGC) comments at the bottom of the page could in French, German, Portuguese, and Pig Latin.
  • Top level domains may not indicate the intended audience. For example, http://ma.tt/, an English-based personal site or Orange.com, a French Telecom site hosted in France.
  • Some sites use redirection techniques that are unfriendly to search.

At Live Search, we attempt to overcome these and other challenges by examining the contents of a site, looking for indications that help us determine its intended audience. Sometimes it’s easy, for instance, when a site has a country code top level domain that matches the body text language of its intended audience. Other times, it’s more difficult. Live Search looks at a number of indicators to ascertain a site’s intended audience. The following are a few of the indicators we look at:

  1. Country code top-level domain (ccTLD). For example, aw.ca specifically targets users in Canada.
  2. Host server location, particularly for .com, .net, and .org.
  3. Language of the body text on the page.
  4. Locale of pages that link to your site.

While we do our best to read the indicators you give us, none of them alone are crystal-clear gauges of a site’s intended geographic interest, so we take them all into consideration. So, how do you ensure that we’re able to determine the intended audience for your content?

Some best practices

There is no single, fully-effective approach to architecting and localizing websites. Books could and have been written on this subject, but there are ways to do it that are friendlier to both your customers and to us. Consider the following recommendations:

Target the location

When writing the content for your page, are you using keywords that tell the end user what location your content is relevant to? If you’re a local business, be sure to include text with the telephone number with the area code and country codes, the physical address (if applicable), the city, state, and country name where you’re located. This’ll help both search engines and customers find you.

If you’re already doing this, one thing you may not have considered is to target additional location keywords that may also represent the location. For example, if your business is located in the Capital Hill area of Seattle, Washington, you’ll want to list both “Seattle” and “Capital Hill” because your customers will likely be searching for both. However, in this case, if you list “Capitol Hill” (note the spelling difference, because your customers might spell it incorrectly) and “Washington,” you might be mistakenly considered to be back east in the District of Columbia!

Be consistent in language

One problem we see from time to time is inconsistent language usage, especially where user generated content is concerned. These pages can trip up our detection, especially when other best practices aren’t being followed.

For example, from time to time, content in MSDN is required for external markets like France, but the page content hasn’t been converted to French or perhaps the content was user-generated, such as in the following image. image

Ensure, wherever possible, that you’re speaking the same language in the title tags, description tags, and rest of your page. Consistency is key.

Create a hierarchy of language

When you’re architecting your site, we recommend grouping your localized content by the TLD, subdomain, or subfolder. Keep all of the content from a region or language grouped together in a single structure. The following are all examples of the good organizational structures for languages using the sample URL www.domain.com:

  • www.domain.ca
  • ca.domain.com
  • www.domain.com/ca-fr/

Don’t mix content intended for one market with the content of another. This is bad for search and can be a bad experience for the customer.

Note: One thing to consider in planning the hierarchy of a global site is how many URLs you produce. Having too many URLs for each market may dilute the whole relevance of your pages and site. Furthermore, you may not get customers to link to the most important pages on your site. See our part 1 post on large site optimization for more thoughts on this topic.

Common mistakes

Sometimes search engines struggle to deliver your content to the right audience due to problems within your (the publisher’s) control. Some of these common mistakes include:

Using a cookie to store the language setting

Some sites store the language setting as a preference in a cookie, but provide no navigational method for seeing the content for other markets. The problem with this approach is that the search engines don’t support cookies when crawling, so we never see anything but the default language. This can also create a less-than-optimal user experience. For instance, my friend Mishka may be reading a site here in the US but then switches to the German version of the same site. The site drops a cookie on her computer to note the change. If Mishka then emails a link to the site to her mother, who doesn’t know English, the site won’t find the German language cookie on her mother’s machine, so she’ll see an English page that she can’t read.

JavaScript

Some website owners spread localized content around a site without a clear path to get to the content. For example, let’s say your site enables customers to use a JavaScript control, like the one shown in the image below, to load localized content within the page, but it doesn’t change the URL for the different language content. In this case, because the crawler can’t execute JavaScript, it’ll never be able to reach anything except the default language content. This scenario also prevents a user from linking to the localized version on the content.

image

Scripting against HTTP_ACCEPT_LANGUAGE

The browser’s HTTP_ACCEPT_LANGUAGE header is passed to a website when the page requests it and informs the site about which language you prefer to receive the content in. If you’re using a script to detect this setting and change the content based on that setting, it’s easy to load all the languages in the website under the same URL. However, if that’s the only way the localized content is accessible, the crawler, which doesn’t pass values to this header, will only see the default language. Having a contextual URL per language is necessary for ensuring the content is crawled.

Market as a parameter

Search engines may sometimes be able to recognize a market setting in a query parameter if it’s using a common nomenclature (such as EN-US), but this is still neither optimal nor friendly for search. It’s better to follow the pattern for URL hierarchy we described above. The following are examples of both standard and non-standard market nomenclature that are used in URLs:

Standard nomenclature:

  • www.domain.com/default.aspx?mkt=ca-fr

Non-standard nomenclature:

  • www.domain.com/default.aspx?market=fran
  • www.domain.com/default.aspx?market=89372

Wrapping up

Search engines are always looking to improve how we detect the location and language of the intended audience for your site. At Live Search, we’re considering several possibilities, from meta tag standards to tools in Webmaster Center. But until those tools become available, by following the recommended practices and avoiding the common mistakes listed above, you can help us make those determinations today. As we make improvements in this space, we’ll continue to make announcements as well as seek your feedback and questions in our forums. Some additional great reads on the subject include:

Jeremiah Andrick, Program Manager, Live Search Webmaster Center

Join the conversation

20 comments
  1. Anonymous

    Yahoo! already supports the Content-Language HTTP Header (or HTTP-EQUIV meta tag) – see their official guidelines on it here:

    http://help.yahoo.com/l/us/yahoo/search/siteexplorer/explore/siteexplorer-42.html

    Having search engines use the same methods of identifying language/country (this uses ISO codes so it can do both) is really, really important for webmasters.

    It’s even better when it re-uses the existing web standards. HTTP Content-Language (and the HTML "lang" attribute" or XHTML’s xml:lang attibute) are really THE correct HTML standards for doing this.

    Note: Whatever you do, please don’t talk about the meta tag name=language – this is an oooold tag which was used by early web browsers before the HTML standards caught up, and is not standardised anywhere and is superseded by the above standards.

    So – HTTP Content-Language, and possibly the "lang" / "xml:lang" attributes, is absolutely, most definitely, the only way to do this. Webmasters everywhere will love you for it.

  2. Anonymous

    A good way to push the aforementioned standards would be to do another joint declaration with the other big search engines (Google/Yahoo!, possibly get Ask involved). This will massively increase the attention it gets, particularly if Google get on board.

    Google already has a way of doing it in Google Webmaster Tools – most webmasters will only do it for Google and not bother doing it for other search engines, even if you provide the same feature in Microsoft’s version of Webmaster Tools. Getting a standard which is included on the shared resource of people’s web pages rather than the walled garden of a search engine (Google) is important.

  3. rickdej

    @Ian  Thanks so much for the comments.  I think in terms of identification there is a lot that a partnership could do to help the publisher community.   Getting a standard together tags a long time because our engines are very different under the hood.   We are committed where possible to join with the other engines.    I think your feedback to this point is really good.   We will let you know if something materializes as a standard.

  4. Anonymous

    Jeremiah

    (unrelated post)

    Can you please comment on the MSN bot’s disregard for robots directives such as ‘crawl delay’ and ‘disallow’?

    There are many related (unanswered) questions on this blog.

    Daily, MSN brings our servers to a crawl. This is a serious issue.

    Help Please.

  5. Anonymous

    Good tips will help improve the site.

  6. Anonymous

    Thanks for all your suggestions and indexing tips for making good site.

  7. Anonymous

    Very less explored subject and you have provided valuable information. Thank you!

  8. Anonymous

    It was suggested that creating specific language subdirectories was a better way to clarify a mutilanguage website.

  9. Anonymous

    Well, this is really good.

    I was always looking for a post like this. Sometimes we get in trouble when making multilingual sites.

  10. himadventures

    I agree with slot guy- that will be better to have a neat and clean output.

  11. Quality Directory

    This is a good article that explains the technical limitations that make search engines encounter problems understanding geotargeting of sites. Helpful to me indeed!

  12. miles2go

    Less discussed topics no one write about it. But it like a root of tree. So thank to write about it.

  13. Quality Directory

    I have a site that targets many countries. Each country is in a dedicated folder/directory. I have implemented the best practices you wrote about here, but with little result. I'm looking for more ways for the content I produce for each country to be found by the right target audience. More articles on this topic will be much appreciated.

  14. Anonymous

    It's really very helpful to increase the web site value.Tips are really working.But same time we need to follow the rule strickly.The update is really very learning.It shows the creativity of writer.The steps described,need to be use in practice. Thanks looking forward for more.

  15. andy.haigh

    Why not implement geo targeting like Google have done. So you can just set it via the webmaster tool rather than messing around with http headers?

  16. m0rad

    Good tips

  17. novintabligh

    Thanks for all this information on multi-language sites.

  18. simon.jones.7272

    Got to agree with @andy.haigh. Googles approach is the best. I can see that it is very difficult to analyse a site to find its intended target. I have one site that is written in English, hosted in the USA, content is about the Canary Islands, and targeting the UK market (holiday niche) – so how would that get figured out?

    Answer, its a complex question… so let the webmasters tell you!

    I think all search engines parse the robots.txt file, so why not extend that for these kinds of things?

  19. md99

    i have a retail site which is a co.uk. The only language used is English.

    At the moment we get 15K hits a month resulting in a significant amout of business. This is mostly a result of being promoted by Bing's main competitor in this field!!!

    This good response is without Bing. I have now added the site to Bing.

    However, because the site is a co.uk am I going to be downgraded or not even listed  by Bing?

    Nothing I have read (yet) indicates that the co.uk will be treated seriously when USA users are searching for the product we sell.

Comments are closed.