As we mentioned earlier, in the latest version of MSN Search we’ve added a number of new advanced query operators. These make it easier to find things using MSN Search, and in some cases capture some of the zeitgeist of the Internet.
One of the most requests operators was filetype:, which enables you to filter documents based on their particular filetype. Currently, MSN Search supports html, txt, and pdf, as well as the primary Office document types: doc, rtf, xls, and ppt for Word, Excel, and PowerPoint documents. This is outstanding for finding official forms which are usually in PDF or DOC format… for example, 1040 filetype:pdf for the official IRS 1040 for US Taxes, or brazil visa filetype:pdf for the Visa application for visiting Brazil, which is coming in handy for those of us who will be traveling to SIGIR in Salvador, Brazil later this year.
Link: and LinkDomain:
We shipped 1.0 with the Link: keyword, which allows you to find pages that link to a single page, ala link:search.msn.com. We’ve added a variation of that, LinkDomain:, which returns pages that link to any page in a given domain. This is a great way for all you bloggers out there to see how many people are linking to you some way some how, and where they’re linking. For example, to see pages that link into MSN, you just issue the query linkdomain:msn.com.
One of our new, experimental operators is called Contains:. Contains:<foo> searches for pages where there’s a hyperlink to a file with the extension <foo>. For example, contains:wmv will find results that contain WMV files. You can augment this with other search terms to narrow your search… for example, the infamous news clip of the exploding whale is easily found via exploding whale contains:wmv. This will fine any filetype that our spider sees a link to on the Web, so it’s a great asset to find binary files that our Crawler doesn’t download and process — audio and video files, images, binaries, and so forth. We’re really just starting to explore the utility of this feature, so we can’t wait to see what you come up with as well!
Blog search in 4 lines of code
One of the tricks to use with Contains: is searching for blogs. As it turns out, most blogs, at least most blogs that are nowadays worth reading, have a RSS feed somewhere on them. Contains: is a great way to find pages that have RSS feeds, which is usually just a RSS, XML, RDF, or ATOM document type. Interested in finding blogs on African Cichlids? Try: African cichlids contains:(rss xml rdf atom). Or perhaps you’re a Steelers fan. It’s a quick hack, but we’ve found that it works surprisingly well!
InURL:, InAnchor:, InTitle:, InBody:
Ever have those times where you’re looking for something you saw once upon a time but can’t for the life of you remember exactly what it was? Perhaps some page where you clicked on a link titled simply “trebuchet” and it had a list of things like “gog” and “magog” but you can’t track it down? These keywords are for you. For example, “inAnchor:trebuchet inBody:gog inBody:magog will put you directly to the site you want.
Various limitations with these operators:
There are currently a few limitations with them, especially the inUrl: and inAnchor: operators. As a commenter in a previous post noticed, inUrl: doesn’t work like the Google operator, it follows the same logic as the rest. So, inurl:trebuchet will find documents containing “trebuchet” somewhere in the domain or path of the URL; however, inurl:www.trebuchet.com/models doesn’t work yet (although we will be considering that for a feature in a future release!). Also note that these operators don’t take multi-word phrases (yet) — for example, to find pages that use “Darth Vader” as the anchor text, you’ll need to use inAnchor:Darth inAnchor:Vader.
Finding pages that link to a certain page with certain terms:
Danny Sullivan over at SearchEngineWatch.com asked us how to use these operators to find pages that link to a target page using specific anchor text. Continuing with his example, both George Bush and Michael Moore have a number of people who link to them using the term “miserable failure.” But who has more links? The query inanchor:miserable inanchor:failure link:www.whitehouse.gov/president/gwbbio.html won’t work. As it turns out, this query returns documents that link to George Bush’s bio page and have some OTHER page linking to them that use the terms “miserable” and “failure.” That’s why you only get a handful of pages, including other president’s bios. As it turns out, InAnchor: doesn’t work too well with Link: and LinkDomain:, and we’ll be doing something about that in a future release. In the meantime, you want to use InBody:, which will also match text found in the links on the page. So, inbody:miserable inbody:failure link:www.whitehouse.gov/president/gwbbio.html and inbody:miserable inbody:failure link:www.michaelmoore.com is the way to go. It’s not perfect, but it’ll get you most of the way there.
Program Manager, Relevance