Eggs, bacon, spam, spam, and spam (SEM 101)

What is spam? One could argue that spam is a multi-faceted thing. The word itself has many definitions. For example, it can be defined as a processed spiced ham and pork slathered with a gelatinous glaze food product found in a tin (it’s apparently very popular in Hawai’i, don’t you know?). However, spam is also often used to reference a very popular comedy sketch written and performed on Monty Python’s Flying Circus (it was punctuated with a waitress reciting a menu in which the canned meat was used multiple times as a featured ingredient in almost every dish, which drives a woman customer to exclaim, “I don’t like spam!”). It can also refer to all of the unwanted, advertising email that congests the Internet on a daily basis is also colloquially known as spam (per one study from 2008, more than 78% of all Internet email traffic on a given day, over 200 billion messages, are unwanted).

It seems that we’ve established a lexicological pattern that most folks don’t like the various incarnations of spam (Hawai’i notwithstanding). While any of those preceding versions of spam may occasionally affect the lives of the webmaster community (especially those who love gelatinous-coated, processed meat products, absurd humor, or inappropriate and unsolicited advertising), the version of spam that concerns webmasters most is web spam.

Web spam

So what is web spam? Web spam is unwanted web content that uses overtly manipulative techniques in an effort to fraudulently attain undeservingly high ranking in search engines. As a webmaster, your work goals include building your brand, instilling customer trust and loyalty, and in the end, getting conversions. But by using web spam techniques, you are ultimately doing yourself more harm than good.

To illustrate why this is a problem, let us stand for a moment in the shoes of a regular Joe (or Jo), who wants to use a search engine to find information on a topic of interest. He or she browses to their search engine of choice (might I suggest Bing? :-), types a keyword or two to direct the search, and then gets the results in a nicely formatted page with lots of relevant choices (I did suggest Bing, after all!). After reviewing the list of top ranked choices in the search results page, he or she clicks on a link, expecting to find a page related to their topic of interest, containing a lot of useful and interesting content. But what if he or she instead ends up on a page that is, at best, tangentially related to their topic or, more likely, on a page filled with unrelated and unwanted content, such as links for casino gambling, illicit pharmaceuticals, quack physical performance enhancers, counterfeit products, fake education degrees, or other dicey, inappropriate material? No one would be happy to be tricked into seeing this garbage.

Averting the potential for such a bad search experience is why Bing works so hard to eliminate web spam from our index. We want customers to find the best results, provide them with a great experience, and thereby instill confidence in our service for future searches. Web spammers, on the other hand, only want to hoodwink the search engines to get artificially high rankings for undeserved — and often non-relevant — content (and many times attempt to scam the customers they snare in their trap).

Web spam vs. junk

In fact, this gaming of the system is what separates web spam from junk pages. Like junk, web spam content typically provides little to no organic value to searchers. But junk pages are just that – useless content. So what kind of pages are considered junk? The following are just some of the types of pages that you can think of as junk:

  • Custom 404 error message pages that erroneously return HTTP status code 200 OK
  • “The page you are looking for has been moved” message pages that do not use redirects
  • Pages with little or no content

As long as those junk pages clearly and legitimately represent what they are to both the public and to search engines, that’s fine. Junk is not a problem for search engines! The definition of web spam disregards the quality or type of content a page provides (or doesn’t provide, as the case may be). Therefore, the designation of a web page by Bing as web spam hinges on whether there is an effort made to manipulate search engine rankings, and if so, to what degree.

Web spam types & repercussions

We currently classify spam based on two types of signals: page-level and link-level. Page-level web spam is comprised of on-page, deceitful search engine optimization (SEO) techniques employed in an attempt to artificially inflate page rank. Link-level web spam uses fraudulent linking strategies for the same purposes. For a page to be labeled as web spam by Bing, at least one of these techniques must be in use.

Webmasters and SEOs considering the use of such techniques need to understand that search engines, which are busy crawling and indexing the content of the Web every day, are exposed to every sort of exploit imaginable. We see it all. We see the pages that want to be associated with one topic when they really are about another. And since we know how much these deceptive sites frustrate our customers (hey, we use search as much anyone, so we can sympathize!), we actively work to detect web spam. And once it is detected, we penalize those sites with actions commensurate with the egregiousness of their offenses, ranging from rank neutralization (intentionally lowering their organic page rank) to permanent expulsion from the index.

The story continues…

We will continue this discussion with additional, in-depth posts about the definitions and details of both page-level and link-level web spam. We’ll also cover how to request re-evaluation of your site’s web spam designation by Bing support staff (in cases of mistaken identification or web page revisions that resolve the problem). Stay tuned!

If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. See you back here soon…

— Rick DeJarnette, Bing Webmaster Center