Uncovering web-based treasure with Sitemaps (SEM 101)

Have you ever noticed how pirate treasure maps are like Sitemaps? While your website may not contain a treasure of gold and silver (unless it’s a metals commodities trading site!), if you have good content, that is certainly treasure to someone who is looking for it. Unfortunately, it’s buried on your website and no one knows what’s there except you! But since you want to share your site’s treasure with others, you need to let them know what you have buried and where to find it. You can wait for search engine crawlers (aka bots) and random traffic to come by to browse, but that will take time and even then, they might not discover everything that you have to offer. Instead, you can help the search bots to dig up your treasured content with a Sitemap.

Now I should pause for a moment to mention that you shouldn’t confuse sitemaps with Sitemaps. You’ve got that, right? Well, just in case that’s as clear as mud, keep this in mind: when referring to sitemap files in text (such as this!), use the lower case word “sitemap” to mean HTML-based files intended for users to browse. They typically contain a list of all the pages on your site.

On the other hand, use the capitalized word “Sitemap” to mean XML-based files designed for use by search engine bots to collect data from webmasters identifying the most important pages and directories within their sites for crawling and indexing. Both types of sitemap files can (and probably should) use all lower case letters in their file name (such as sitemap.xml and sitemap.htm), but capitalize the references to the XML-based one in text to help readers distinguish which type of sitemap you are discussing. This article, coming from the perspective of a search engine, is focusing on Sitemaps, not sitemaps. You’re with me now, right? :-)

A good Sitemap will tell search engine bots about the content stored on a site. That helps the content be seen by the bot and, with any luck (assuming the content is well formed and has value), get into the index. Users who are on a content treasure quest will query search engines with keywords to locate the content they are seeking. If the search engine indexed the content found by the bot, which can be more likely when a good Sitemap is present, then that site’s content has a better chance for appearing in the search engine results pages (SERP). After all, you can’t get onto the SERP if your pages aren’t indexed!

Structure

A Sitemap file, saved to the root directory of your site, contains references to specific URL locations for pages (or to other Sitemap files on very large sites), often describing the last modified date, the typical change frequency for a page, and the priority the specified page has compared to the other content on your site.

A brief example of the contents of a Sitemap file looks like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
   <url>
     <loc>http://www.mysite.com/default.htm</loc>
     <lastmod>2009-03-01</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.8</priority>
   </url>
   <url>
     <loc>http://www.mysite.com/contacts.htm</loc>
     <changefreq>yearly</changefreq>
     <priority>0.4</priority>
   </url>
</urlset>

The <urlset> tag is standard and points to the current protocol to reference. The <url> and <loc> tags are the minimum required data needed for each page entry. The other tags, <lastmod>, <changefreq>, and <priority>, are optional, additional data. To see the data entry formatting and attributes used for these optional tags, sail on over to Sitemaps XML format for reference information. Note that not every page on your site need be listed in the Sitemap—only the ones containing valuable content for the user.

File formats

Bing supports Sitemap files submitted as XML and gzip files, but not as HTM or HTML files (those would be sitemaps as opposed to Sitemaps, right? Besides, the XML content of a well-formed Sitemap file wouldn’t render correctly in browsers as an HTM file, anyway). If you’ve created a browsable HTML-based sitemap for your end users, they will thank you for the effort, but you can’t recycle it as a Sitemap. You’ll still need to create a separate, XML-based Sitemap file using the tag structure as noted above for submission to Bing.

Size matters not so much anymore

A typical treasure map only has one X to mark the spot. However, your Sitemap can list multiple locations identifying the treasures on your site. It used to be considered common wisdom in the search engine optimization (SEO) community that there can be too much of a good thing. It used to be accepted that from a search engine perspective, the most effective size for a Sitemap is approximately 150 or fewer URLs. Anything more bountiful and the crawler may not take it all in. Well not so fast, matey!

Per the post Bing enhances support for large Sitemaps made in this blog just a few weeks ago, Bing now supports Sitemap files that contain up to 50,000 references (to either URLs or links to other, child Sitemap files). This development is a boon for webmasters of very large sites. They can now create multiple child Sitemap files, each dedicated to mapping specific areas of their content organization, and store those child Sitemap files in the base directories of those content areas. Then they can link to the child Sitemaps via their primary (aka index) Sitemap file (the one stored in the root directory of a site). One index Sitemap linking to 50,000 child Sitemaps, each of those referencing up to 50,000 URLs, means they can reference up to 2.5 billion URLs through the Sitemap technology, and the Bing crawler, MSNBot, will read it all. Now that’s a lot of treasure!

Sitemap submission

There are multiple ways for webmasters to submit their Sitemaps to Bing:

  • Ping service. Using your browser’s address bar, you can directly submit your Sitemap to Bing. Type

    http://www.bing.com/webmaster/ping.aspx?sitemap=www.YourURL.com/sitemap.xml

    substituting the full URL to your Sitemap file in place of the YourURL.com example.

  • Webmaster Center tools. You can sign in to Bing’s Webmaster Center tools and use the Sitemaps tool (if you are not already registered to use these free tools, this is a good reason to sign up and see all the other tools available to help you analyze and optimize your site). Simply copy the URL of your Sitemap into the Direct sitemap submission text box, and then click Submit.
  • Robot.txt file reference. If you are using a robots.txt file to instruct search engine bots which files and directories not to crawl and thus block from adding to their indexes, you can add a line to that file, most typically done at the end, that reads as follows:

    Sitemap: http://www.YourURL.com/sitemap.xml

    substituting the full URL to your Sitemap file in place of the YourURL.com example.

Validation

Before submitting your Sitemap to Bing, we recommend that you run the XML code you’ve written through a Sitemap validation tool. After all, what good is a treasure map if it has errors in it? Do a search for your Sitemap validator of choice and follow the instructions on the page. If errors are found, correct them before you submit the Sitemap file to Bing.

Once you submit your Sitemap to Bing, we will read its contents, which will help us with uncovering more of the content treasures buried on your site and evaluating it as potential new additions to our index. And with more of your site’s content in the index, instead of users sailing on past your site to other ports of call in their quest for content treasure, they may stop at yours and exclaim, “Shiver me timbers, matey, look at what we have here!”

If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Until next time…

– Rick DeJarnette, Bing Webmaster Center

Join the conversation

38 comments
  1. miles2go

    Its good way for crawl,i had done it but few pages got indexed in bing.

  2. Quality Directory

    Normally I create simple Sitemaps without describing the last modified date, the change frequency, and priority.

  3. Anonymous

    great advise i need to make one for my site again. i had written an asp page to generate sitemaps automatically then you could tweak them. i really got to find that

  4. Anonymous

    Don't forget : you aren't limited to only one Sitemap index… so in the end you can have 50.000*unlimited URLs in your Sitemaps.

  5. Anonymous

    The 3 layer sitemap file to reach 2.5 billion URLs is really exciting indeed. Great work Bing team :-)

  6. spc702

    I'm still a SEO dummy but have been learning a lot lately. I am still having trouble with my Sitemap and sitemap.html. When I submit to google, they say that it has to be ".xml" format but my website provider, Register.com, doesn't let me save under ".xml".

    So now I have both html and xml formats on my sitemap.html page even though bots don't acknowledge it:(

    My website has been making progress in search results so i'm leaving this sitemap issue alone.

  7. Anonymous

    Even with Sitemaps, Bing is rather slow on finding new websites. Search engine results are much better than on Windows Live Search though.

  8. Anonymous

    Like Carter, I think it's a great advise and I will make one for one of my sites again.

  9. Anonymous

    The website information is besed on Sitemaps because all seo bots were coming to Sitemap.

  10. eroswebmonster

    met herre alot stuff wrong with this here

  11. Anonymous

    I have a rather small website, should I use sitemap?

  12. Anonymous

    Thanks for the great article about submitting your sitemap to bing in XML format, I will do this for sure!

  13. Anonymous

    Great Article

  14. Anonymous

    Thanks for the information. Will definitely try it. :-)

  15. Synergy Connections

    Is there a quick way to build a XML sitemap from a HTML one, or to generate a valid XML sitemap using a tool?

    It would save a lot of typing!

  16. Synergy Connections

    Hi All

    Gor anyone who uses Joomla, I can reccomend XMap – it will generate both a HTML and XML sitemap dynamically – so you can create it and forget it!

  17. Anonymous

    its really great article and nice thank you

  18. Anonymous

    i love and sorry i will also recommend every one to see your post and i will surely digg this post

    thank you for the great information

  19. Riky Neal

    At This time in space, I have NO IDEA were I'm at OR what I'm DOING !!!!!

  20. Anonymous

    @synergy connections ltd – You could do a search for 'xml sitemap generators'. There quite a few that will automatically generate your sitemap for you and update it.

  21. imrul_kayes2003

    good

  22. Anonymous

    Good Info..!

  23. alan_ie

    xml sitemaps with proper styling {xsl stylsheets} are perfectly browsable by end users

    whatever the moronic author might claim

    http://www.alandoherty.net/sitemap.xml

    is just one example of mine

  24. Anonymous

    Thanks for your information, i have read it, very good!

  25. Anonymous

    Thanks for your information, i have read it, very good!

  26. Anonymous

    If you want to add multiple site , Bing will allow to use sitemap index files ?

  27. shoutcaster

    This info  is much helpful. Thanks for sharing…

  28. B. N. Singh

    I am implimenting it to get my new blog (www.techno-pulse.com) indexed in Bing :)

  29. Anonymous

    There are multiple ways for webmasters to submit their Sitemaps

    http://www.cinfilm.com

  30. carter.cole

    in a previous comment i mentioned that i had written a sitemap generator that used a file FSO to generate a sitemap of flat sites automatically here is my dev page cartercole.com/…/dev.asp check out the part about sitemap generator… the source is there too

  31. marokus

    But aren't there more specific sitemap structures? Like Sitemaps for different Content (News, Image, Video)?

  32. kareparambilmoh

    Great article…thanks for sharing about sitemap

  33. irfanullah.jan

    Do you support atom feed as sitemap?

  34. irfanullah.jan

    Can I have an extension less XML Sitemap such as "example.com/sitemap" provided that it will have correct content type "text/xml"? I want to create a dynamic one in PHP.

Comments are closed.