The Search Engine Professionals at Rank for $ --- In business since 1997.
Back to our Homepage SEO Tips that will make a big difference in your rankings and our most popular ** How To ** section The most common myths about SEO -- Read what the experts have to say about today's most common SEO myths and misconceptions Frequently Asked Questions to Search Engine Optimization and Positioning Search Engine Optimization Industry News -- Stay in tune with the most recent developments in search engine technology and the SEO industry Contact Rank for $ales today and get your site's rankings high in the engines-- Right where they should be!


Search this site

Insurance against potential penalties

September 3, 2004

With the Robots.txt protocol, a webmaster or web site owner can really protect himself if it is done correctly. Today, web domain names are certainly plentiful on the Internet. There exists a multitude of sites on just about any subject anybody can think of.

Most sites offer good content that is of value to most people and can certainly help with just about any query. However, like in the real world, what you see is not always what you get.

There are a lot of sites out there that are spamming the engines. Spam is best defined as search engine results that have nothing to do with the keywords or key phrases that were used in the search. Enter any good SEO forum today and most spam topics in daily threads usually point to hidden text, keyword stuffing in the meta tags, doorway pages and cloaking issues. Thanks to newer and more powerful search engine algorithms, these domain networks that spam the engines are increasingly being penalized or banned all together.

The inherent risks of getting a web site banned on the basis of spam increases proportionately if it appears to have duplicate listings or duplicate content. Rank for $ales does not recommend machine-generated pages because such pages have a tendency of generating spam. Most of those so-called �page generators� were not designed to be search engine-friendly and no attention was ever given to engines when they were designed.

One major drawback of these �machines� is that once a page is �optimized� for a single keyword or key phrase, first-level and at times second-level keywords tend to flood results with listings that will most assuredly look as 100% spam. Stay away from any of those so-called �automated page generators�.

A good optimization process starts with content that is completely written by a human! That way, you can be certain that each page of your site will end up being absolutely unique.

How do search engines deal with duplicate content?
Modern crawler-based search engines now have sophisticated and powerful algorithms that were specifically designed to catch sites that are spamming the engines, especially the ones that make use of duplicate domains. To be sure, there are perfectly legitimate web sites whose situation can certainly be informative. However, and as the following example will clearly demonstrate, that is not always the case.

We will take this practical example of where there are actually three identical web sites, all owned and operated by the same company, where the use of duplicate content is evident. Google, Alta-Vista and most other crawler-based search engines have noticed and indexed all three domains. In this scenario, the right thing to do is to make use of individual IP addresses and implementing a server re-direct command (a 301 re-direct).

An alternative to this would be to at least provide unique folders or sub-directories and using the Robots.txt exclusion protocol to disallow two of the three affected domains.

That way the search engines wouldn�t index the two duplicate sites. In such cases, the Robots.txt exclusion protocol should always be used. It is in fact your best �insurance� against getting your site penalized or banned. In the above example, since that was not done we will look at duplicate content and assess where the risk of getting a penalty is the highest. We will list and describe the indexing of these three sites as being site one which is the main primary domain, site two and finally, site three.

Fill out your e-mail address
to receive our free newsletter!

The four major crawler-based engines that were analyzed were Google, Teoma, Fast and Alta-Vista. All three domain names point to the same IP address, which actually made it simpler to use Fast's Internet Protocol filter to discover that there was really no more than three affected domains in this example.

However, all three web sites are directed to the same IP address AND content folder! Such a scenario makes them in fact exact duplicates, raising all the duplicate content flags in all four engines analyzed.

Even if all three sites share the same Robots.txt file, the hosting arrangement and syntax in the Robots.txt file does nothing that is effective to help this duplicate content problem. Major spider-based search engines which rely a lot on hypertext to compute relevancy and importance as most do today, are best at discovering and dealing with sites that delve into duplicate content issues.

As a direct result, a webmaster runs a large risk of having duplicate content in these engines because their algorithm makes it such a simple task to analyse, sort out and finally reject these duplicate content web sites.

If a �spam technician� discovers duplicate listings, chances are very good they will take action against such offending sites. The chances actually increase when a person, often a competitor files a spam complaint or that a certain site is �spam-dexing� the engines. To be sure, any page caused by duplicate content can improperly "populate" a search query. The end result is unfairly dominating most search results.

Marketing analysis and PPC �landing� pages
In order to better analyse specific online marketing campaigns or surveys, some companies at certain times have in fact duplicate sites or operate PPC (Pay-per-Click) landing pages. It is important in such cases not to neglect to use the Robots.txt exclusion protocol to manage your duplicate sites. Disallow spiders from crawling duplicate sites by properly editing the right syntax in the Robots.txt file.

Your index count will certainly decrease, but that is the right thing to do and you are actually performing the search engines a service. In such a case, a webmaster needs not to worry of impending penalties from the engines.

If these businesses or their marketing departments are in fact running marketing tests or surveys, there is usually more than just one domain that could potentially appear in the actual results pages of the engines. In such cases, I strongly recommend writing or re-writing all content all over and making certain that no real duplicate content gets to be indexed.

One way to achieve that is to use some form of meta refresh tag or Java script solution to actually direct visitors to the most recent versions of pages while their webmasters get the Robots.txt exclusion protocol written correctly.

The Java script would effectively indicate where it is intended to redirect, assuring it can put the final document in its proper place. A �301 server redirect� command is always the best thing to use in these cases and constitutes the best insurance against any penalties, as it will inform the search engines that the affected document (s) have in fact moved permanently.

(Updated from my original February 2000 article).

Article written by Serge Thibodeau,
President & CEO,
Rank for $ales
Copyright (c) Serge Thibodeau 2000

Unless otherwise specified, all content and material on this site is copyrighted by Serge Thibodeau of and may not be reproduced by any means without express written permission. Using my content without permission is a theft of my work. Please contact to discuss certain reprint options that would be acceptable.

You can read some of Serge Thibodeau's exclusive comments that are not posted on this website. Visit his personal blog by clicking here. For hardware, software or IT-related technology questions, it is recommended you visit

We strongly suggest you bookmark our web site by clicking here.

Tired of receiving unwanted spam in your in box? Then get SpamArrest� and put a stop to all that nonsense. Click here to get all the details.
Tired of receiving unwanted spam in your in box? Get SpamArrest� and put a stop to all that SPAM. Click here and get rid of SPAM forever!

Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it.

Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. The keywords you think are the best may be totally different than the ones recommended by WordTracker. Click here to start your keyword and key phrase research.

Back to the top of the page.         
Pay Rank for $ales securely with your Visa, MasterCard, Discover, or American Express credit card through the secure PayPal network. (Note: PayPal is an eBay company, and maintains a net free capital of US $ 50 Million).
VisaMasterCardDiscoverAmerican Express

You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others.

Powered by Sun Hosting          Protected by Proxy Sentinel�          Traffic stats by Site Clicks�

Site design by GCIS              SEO enhanced by Pagina+�            Online sales by Web Store�

Call Rank for Sales toll free from anywhere in the US or Canada:   1-800-631-3221

| Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact |

Copyright � Rank for Sales 2003    Terms of use    Privacy agreement    Legal disclaimer