The Google Goal Of Indexing 100 Billion Web
Pages
Author: Danny Wirken

Google's Goal of Quality Search

In their paper 'The Anatomy of a Large-Scale Hypertextual Web
Search Engine' it is very evident that Google's goal has always
been to be one of the best search engines there is in terms of
the quality of the results it gives. Sergey Brin and Lawrence
Page, however knew that in order to do this, Google needed to
be able to store information efficiently and cost effectively
and to have excellent crawling, indexing, and sorting methods
or techniques. Google not only aimed to give quality results
but to produce the results as fast as possible. Google started
as a high quality search engine and continues to be the best
search engine today. It has managed to stay true to its
original intent to be a search engine that not only crawls and
indexes the web efficiently but also to produce more satisfying
results in comparison to other existing search engines.

To stay true to their goal of providing the best search results
Google knew right from the start that it had to be designed so
that the search engine could catch up with the web's growth.
According to Brin and Page "In designing Google we have
considered both the rate of growth of the Web and technological
changes. Google is designed to scale well to extremely large
data sets. It makes efficient use of storage space to store the
index". They knew that they needed much space to store and ever
growing index.

Google's index size, which that started out as 24 million web
pages was large for its time and has grown to around 25 billion
web pages, still keeping Google ahead of its competitors.
However, Google is a company that doesn't settle for just
beating the competitors. They truly aim to give their users the
best service there is and that means as a search engine they
want to give users access to all or at least most of the
quality information that is available on the web.

Google's New System for Indexing More Pages

As mentioned earlier, Google aims to give access to even more
information and has been devoting time and much effort to
realize this goal. It seems that the new patent entitled
'Multiple Index Based Information Retrieval System' filed by
Google employee Anna Patterson might be the answer to the
problem. The patent published just this May of 2006 and filed
way back in January of 2005 shows that Google might actually be
aiming to expand their index size to as much as a 100 billion
web pages or even more.

According to the patent, conventional information retrieval
systems, more commonly known as search engines, are able to
index only a small part of the documents available on the
Internet. According to estimates the existing number of web
pages in the Internet as of last year was around 200 billion;
however, Patterson claimed that even the best search engine
(that is Google) was able to index only up to 6 to 8 billion
web pages. The disparity between the number of indexed pages
and existing pages clearly signaled a need for a new breed of
information retrieval system. Conventional information
retrieval systems just weren't capable of doing the job and
just wouldn't be able to index enough web pages to give users
access to a large enough percentage of the present existing
information available on the web.

The Multiple Index Based Information Retrieval System, however,
is up to the challenge and is Google's answer to the problem.
Two characteristics of the new system makes it stand out
compared to the conventional systems. One is that it has the
"capability to index an extremely large number of documents, on
the order of a hundred billion or more". And the other is its
capability to "index multiple versions or instances of
documents for archiving…enabling a user to search for documents
within a specific range of dates, and allowing date or version
related relevance information to be used in evaluating
documents in response to a search query and in organizing
search results." With the new system developed by Patterson,
Google now has the ability to expand its index size to
unbelievable proportions as well as improve document analysis
and processing, document annotation, and even the process of
ranking according to contained and anchor phrases.

History of Google's Index Size

Google started out with an index size of around 24 million web
pages in 1996. By August of 200, Google had managed to
quadruple their index size to approximately one billion web
pages. On September of 2003 Google's front-page boasted and an
index of 3.3 billion web pages. Microdoc, however, revealed
that the actual number of web pages Google had indexed during
that time was more than five billion web pages already. In
their article 'Google Understates the Size of Its Database',
they emphasized that Google not only specialized in simplicity
but also in understating their power and complexity. Google was
still managing to stay ahead of its competitors and continued to
surprise everyone with what they had under their sleeves.

As Google's index continued to grow the number in their front
page grew impressively large as well before it plateaud at
eight billion web pages. This was around the time that
Patterson filed the new patent. Then in 2005, with
controversies in index size growing, Google decided to stop
counting in front of the public and simply claimed that their
index size was three times larger than the nearest competitor's
index size. Google also maintained that it was not just the size
of indexed pages that was important but how relevant the results
they returned were. Then in September of 2005, as part of
Google's 7th anniversary, Anna Patterson, the same software
engineer who filed the patent on the Multiple Based Index
Information Retrieval System posted an entry on Google's
official blog claiming that the index size was now 1,000 times
larger than the original index. This pegged their index size to
around 24 billion web pages, about a fourth of Google's goal of
indexing a100 billion web pages. It seems then that Google must
have started using the new system in mid 2005. With the new
system in place we can only wait and see how fast Google will
reach the goal of a 100 billion web pages in its index. It's
most likely though that when Google has reached that goal it
would set an even higher goal to provide continuous quality
service.


About The Author: http://www.theinternetone.net