When Google arrived on the scene in the late 1990s, they came in with a new idea of how to rank pages. Until then, search engines had ranked each page according to what was in the page - it's content - but it was easy for people to manipulate a page's content and move it up the rankings. Google's new idea was to rank pages largely by what was in the links that pointed to them - the clickable link text - which made it a little more difficult for page owners to manipulate the page's rankings.
Changing the focus from what is in a page to what other websites and pages say about a page (the link text), produced much more relevant search results than the other engines were able to produce at the time.
The idea worked very well, but it could only work well as long as it was never actually used in the real world. As soon as people realised that Google were largely basing their rankings on link text, webmasters and search engine optimizers started to find ways of manipulating the links and link text, and therefore the rankings. From that point on, Google's results deteriorated, and their fight against link manipulations has continued. We've had link exchange schemes for a long time now, and they are all about improving the rankings in Google - and in the other engines that copied Google's idea.
In the first few months of this year (2006), Google rolled out a new infrastructure for their servers. The infrastructure update was called "Big Daddy". As the update was completed, people started to notice that Google was dropping their sites' pages from the index - their pages were being dumped. Many sites that had been fully indexed for a long time were having their pages removed from Google's index, which caused traffic to deteriorate, and business to be lost. It caused a great deal of frustration, because Google kept quiet about what was happening. Speculation about what was causing it was rife, but nobody outside Google knew exactly why the pages were being dropped.
Then on the 16th May 2006, Matt Cutts, a senior Google software engineer, finally explained something about what was going on. He said that the dropping of pages is caused by the improved crawling and indexing functions in the new Big Daddy infrastructure, and he gave some examples of sites that had had their pages dropped.
Here is what Matt said about one of the sites:
Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages.
And about the same site, he went on to say:
A few more relevant links would help us know to crawl more pages from your site.
Because the site hasn't attracted enough relevant links to it, it won't have all of its pages included in Google's index, in spite of the fact that, in Matt's words, "it seems like a fine site". He also said the same about another of the examples that he gave.
Let me repeat one of the things that he said about that site. "A few more relevant links would help us know to crawl more pages from your site." What??? They know that the site is there! They know that the site has more pages that they haven't crawled and indexed! They don't need any additional help to know to crawl more pages from the site! If the site has "fine" pages then index them, dammit. That's what a search engine is supposed to do. That's what Google's users expect them to do.
Google never did crawl all sites equally. The amount of PageRank in a site has always affected how often a site is crawled. But they've now added links to the criteria, and for the first time they are dumping a site's pages OUT of the index if it doesn't have a good enough score. What sense is there in dumping perfectly good and useful pages out of the index? If they are in, leave them in. Why remove them? What difference does it make if a site has only one link pointing to it or a thousand links pointing to it? Does having only one link make it a bad site that people would rather not see? If it does, why index ANY of it's pages? Nothing makes any sort of sense.
So we now have the situation where Google intentionally leaves "fine" and useful pages out of their index, simply because the sites haven't attracted enough links to them. It is grossly unfair to website owners, especially to the owners of small websites, most of whom won't even know that they are being treated so unfairly, and it short-changes Google's users, since they are being deprived of the opportunity to find many useful pages and resources.
So what now? Google has always talked against doing things to websites and pages, solely because search engines exist. But what can website owners do? Those who aren't aware of what's happening to their sites simply lose - end of story. Those who are aware of it are forced into doing something solely because search engines exist. They are forced to contrive unnatural links to their sites - something that Google is actually fighting against - just so that Google will treat them fairly.
Incidentally, link exchanges are no good, because Matt also said that too many reciprocal links causes the same negative effect. The effect being that the site isn't crawled as often, and fewer pages from the site are indexed.
It's a penalty. There is no other way to see it. If a site is put on the Web, and the owner doesn't go in for search engine manipulation by doing unnatural link-building, the site gets penalised by not having all of its pages indexed. It can't be seen as anything other than a penalty.
Is that the way to run a decent search engine? Not in my opinion it isn't. Do Google's users want them to leave useful pages and resources out of the index, just because they haven't got enough links pointing to them? I don't think so. As a Google user, I certainly don't want to be short-changed like that. It is sheer madness to do it. The only winners are those who manipulate Google by contriving unnatural links to their sites. The filthy linking rich get richer, and the link-poor get poorer - and pushed by Google towards spam methods.
Google's new crawling/indexing system is lunacy. It is grossly unfair to many websites that have never even tried to manipulate the engine by building unnatural links to their sites, and it is very bad for Google's users, who are intentionally deprived of the opportunity to find many useful pages and resources. Google people always talk about improving the user's experience, but now they are intentionally depriving their users. It is sheer madness!
What's wrong with Google indexing decent pages, just because they are there? Doesn't Google want to index all the good pages for their users any more? It's what a search engine is supposed to do, it's what Google's users expect it to do, and it's what Google's users trust it to do, but it's not what Google is doing.
At the time of writing, the dropping of pages is continuing with a vengeance, and more and more perfectly good sites are being affected.Source : http://www.webworkshop.net