The liability of loathsome, link-level web spam (SEM 101)

When I was a kid in high school, I used to go to the public library and do initial research in the Encyclopedia Britannica (yes, the bound book editions. I also remember black & white television with vacuum tubes and rotary telephones! Sheesh, I’m getting old!). I would pick up the index volume that contained the keyword I wanted to look up to identify which of the main volumes had the content I sought.

But imagine this: when I opened up the referenced main volume to the page specified, I always found the content I wanted. I never once went to the content page referenced in the index and found a page full of advertisements, come-ons for dubious physical enhancement pharmaceuticals, or any irrelevant, unwanted garbage like that. That’s how Internet search is supposed to work, too.

Search engine as master index

However, unlike the Encyclopedia Britannica, which maintained sole control over the information it published (thus making its index a really good bet for finding the content you want), the fast and loose world of the Internet is open to all comers, for better or for worse. The good of that trait allows for information of all types, from highly important to trivial (and in all ranges of value, from well-researched reports to skewed opinions to deceptive trash), to be found, but you must know where to look. This is where a search engine’s role as master indexer comes into play. Services like Bing use their own resources to scan the Web for content and organize their findings into a useful index of the available content for users.

But since no one entity has control over the content placed on the Web, the useful and informative website is joined by the unscrupulous huckster, who spends a huge amount of effort to deceive the search engine index in order to bring unsuspecting web searchers to their irrelevant website. This deception is the core of web spam.

Bing and other search engine service providers work diligently to detect and eliminate web spam-tainted results from getting into our search engine results pages (SERPs). It’s a tough battle, and it requires a great deal of work to keep our SERPs useful and legitimate for search customers.

We’ve already discussed the basic definition of web spam and one of the two major implementations, page-level web spam, in previous blog articles. We’ll wrap up this web spam series with a discussion of the other major type, link-level web spam. And finally, we’ll discuss what a webmaster can do to restore their website’s listing (removing penalties) with Bing once the detected web spam has been removed.

Definition of link-level web spam

Link-level web spam uses web link deceptions in an attempt to artificially inflate the page rank of a specific page or site. Savvy webmasters know that earning high quality, relevant inbound links from authoritative sites can have a very positive influence on the search engine’s rank of the linked site – we recently published a blog post on this subject titled, Link building for smart webmasters (no dummies here). This is good search engine optimization (SEO). Some less savvy and/or more unscrupulous folks believe they can simply substitute the “high quality, relevant” part of the equation for high quantity and swap “authoritative sites” for either junk or irrelevant sites and achieve the same goal. Sadly for them, this is not the case.

The intent of the link-level web spammer is to create huge numbers of inbound links (typically from unrelated, low quality sites) to attain illegitimate page rank for a site to fool web searchers into visiting their sites. Luckily, Bing and the other search engines can assess the quality and authority of a particular website.

Sites employing link-level techniques also often employ page-level web spam techniques to make their sites appear to be relevant to a commonly searched keyword when they are not. The use of link-level web spam techniques will cause a search engine to examine your site more deeply, and if it’s determined to be using web spam techniques, your site could be penalized.

As we stated earlier with page-level web spam, some of these techniques can have valid uses at their core, but the intention behind their use is the distinguishing factor. When we detect deceptive intent as we crawl the Web, we identify those pages as web spam and penalize them as appropriate, ranging from neutralization (which levels the playing field for other sites offering content on the same subject) to expulsion from the index. As you can imagine, for an online-based business, these are serious consequences, so it pays to know what NOT to do when you optimize your site for search (or hire a consultant to do the same).

Post web spam

Definition: This is a form of user-generated content (UGC)-based outbound links posted in other web sites, such as in guest book pages, forums, blog comments, message boards, and referrer logs.

Problem: The destination links in post web spam are usually unrelated, topic-wise, to the page containing the UGC outbound link. Often these posts include multiple links. In sites that rely on post web spam for inbound links, it is not unusual for a sizable percentage of all of their inbound links to be from post web spam.

What we look for: Several techniques for implementing post web spam are commonly used, including:

  • Add backlinks to all UGC content. When users go onto websites that allow UGC to be created, those who use post web spam include backlink URLs to their sites, even if they don’t have anything to do with the comment or, more significantly, the theme of the UGC-sponsoring site.
  • Automation. Spammers often use automated techniques to repeatedly submit the same UGC post containing short, generic text and a clickable URL to their sites in every UGC-sponsoring page possible.
  • Keyword stuffing. Post web spam text is often keyword stuffed. Check out our page-level web spam article titled The pernicious perfidy of page-level web spam for more information on this.
  • Massive repetition. Lots of non-relevant, poor quality, inbound links come from such pages as online guest books, forums, and blog comments.

What post web spammers don’t often realize is that many UGC-oriented pages automatically append the attribute rel=”nofollow” to any links created in UGC content. As such, no inbound link credit is derived when search engines crawl and index these pages.

From a webmaster point of view, however, we encourage active, regular cleaning up (or better yet, preventing) of UGC-based web spam content. If there is too much junk or web spam content on a page, it could reflect poorly on the overall quality of your page, even if you are employing rel=”nofollow” to URLs. For that matter, it is very important for any site that allows UGC content to actively monitor their site’s security. Hosting malware can also get a site penalized, and you don’t want that! For more information on malware, see our blog article series called The merciless malignancy of malware, Part 1, Part 2, Part 3, and Part 4.

Link farming

Definition: A link farm is a large collection of websites that exist for the sole purpose of providing massive numbers of links to targeted websites, ostensibly to improve the appearance of their organic, online popularity.

Problem: Link farming is often employed to promote one website using many other websites or it can be a commercial enterprise in which the link farm sells its unscrupulous (and worthless) outbound linking services to less-SEO-savvy webmasters.

What we look for: Link farming is often implemented using the following techniques:

  • Large, sudden surge of new inbound links. When dozens or hundreds of inbound links suddenly appear for a new or a previously small website, such a big change can indicate link farm web spam activity. The relevance of the outbound linking sites will be a key factor in whether or not such a sudden change warrants further investigation.
  • Consistent similarities between outbound linking sites. If a large number of the inbound links for a site come from sites that are very similar in design, structure, and other key characteristics, this can lead to deeper scrutiny of a website for web spam.
  • Poor linking standards. A link farm will often have a large number of unrelated links on the page, or will have related links to many sites that employ other spam methods.The pages themselves are designed to maximize the number of links on them, favoring outbound links rather than original content on the page.

When link farms are identified, those sites are penalized, which negates the value of the links they contain. In addition, the pages they link to are more likely to be heavily scrutinized for other forms of web spam.

See our earlier blog articles for more information on what makes a good link versus a bad link.

Link exchanges

Definition: Unlike link farms that target a few selected sites, link exchanges are organized groups of websites who participate in providing reciprocal inbound and outbound links ostensibly to benefit all websites in the exchange.

Problem: Web spam-oriented link exchanges typically involve unrelated web sites reciprocally exchanging links en masse for the purposes of rank inflation. As such, they offer no value to human visitors, and thus they are candidates for being considered web spam.

While earning inbound links are a part of legitimate SEO activities, as we’ve stated before, Bing values quality links over quantity of links. Inbound links from sites unrelated to the theme of your site, typical with most link exchanges, will be of little to no value to you for improving your page rank.

What we look for: Link exchanges usually include the following activities:

  • Starts out as email spam. Link exchanges often start out as spam emails sent from webmasters of unrelated sites asking other webmasters if they would like to improve their ranking by exchanging links.
  • Excessive links. Link exchanges (reciprocal links) between unrelated sites, especially when done to excess, can be indicators of web spam, and a participating website might be more heavily scrutinized for other web spam problems.

Note that reciprocal linking is not an automatic red flag. Some websites within a particular niche will link to others when it provides a relevant value to their customers. For example, think of a bed and breakfast who links out to local wineries and a winery who links out to local bed and breakfasts – these are interrelated activities to a region that are naturally relevant for site visitors.

But as usual, too much of a good thing can be bad. And when there is no relevance between linked sites, the value of link exchanges can quickly degrade down to the level of web spam (especially when the numbers of unrelated links is deemed excessive).

Penalties and restitution

Mistakes happen. An entrepreneurial do-it-yourselfer optimizes a website based on bad (spammy) advice from the Web. A Mom-and-Pop-shop website owner naively hires an unscrupulous website consultant. Heck, it’s even possible that a search engine might mistakenly label an innocent site as web spam. So what do you do?

If you made a mistake on your site and your rank has been neutralized, the solution is easy. Web spam neutralization is handled automatically with Bing. If you are using web spam techniques on your website and you want to remove the site’s web spam neutralization penalty, eliminate the web spam violations and then republish your website. Once the Bing crawler, MSNBot, recrawls your site, if the web spam violations have been removed, the neutralization will be automatically resolved in the index.

But what if your site has been purged from the Bing index? That requires some manual intervention.

Request reconsideration for your site

If you search for your site in the Bing index using the advanced search keyword phrase (using your URL, of course!) and nothing turns up, your site is not in the index. If this is a sudden change and you know you’ve used some unscrupulous web spam techniques, you’ll need help to get back into the index.

First of all, fix all of the web spam violations on your site. Not just one or two, but all of them. Then, once you’ve republished a corrected version of your website, contact Bing support to request reconsideration of your website’s penalty. Here’s how:

  1. Go to Bing E-mail Support and fill out the form completely
  2. Select Content Inclusion Request from the drop-down list. A new drop-down will appear underneath.
  3. From the new drop-down list, select Reinclusion request.
  4. Write a clear and detailed explanation of what you have done to resolve the problem in the next text box. (You can prepare this in advance, and then copy and paste the text into the form.)
  5. Type the security code from the presented image into the text box below.
  6. Once the form is completed, click submit.

A member of the Bing support team will quickly review your request and schedule your site to be recrawled. If the crawler determines that all of the violations have indeed been resolved, then your site is eligible to be added back into the index. But be patient – this process doesn’t happen overnight (which is why it’s a wise idea to avoid such web spam penalties in the first place).

For more information on Bing penalties and restitution, see the blog article Getting out of the penalty box.

If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Later…

— Rick DeJarnette, Bing Webmaster Center

P.S. It was suggested to me that I list the other articles in this web spam series for those who might be interested in reading the entire set, so here goes (in order of publication):