Search Engines and How They Work

May 24, 2008 by admin  
Filed under Search Engines

Search Engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various Search Engines work, but they all perform three basic tasks:

ü They search the Internet - or select pieces of the Internet - based on important words,

ü They keep an index of the words they find, and where they find them, and

ü They allow users to look for words or combinations of words found in that index.

Early Search Engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top Search Engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.

Before a Search Engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a Search Engine employs special software robots, called spiders, to build lists of the words found on Web sites.

When a spider is building its lists, the process is called web crawling.

In order to build and maintain a useful list of words, a Search Engine’s spiders have to look at a lot of pages. How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.

Once the spiders have completed the task of finding information on Web pages, the Search Engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users:

ü The information stored with the data, and

ü The method by which the information is indexed.

In the simplest case, a Search Engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.

To make for more useful results, most Search Engines store more than just the word and URL. A Search Engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the META tags or in the title of the page. Each commercial Search Engine has a different formula for assigning weight to the words in its index. This is one of the reasons that a search for the same word on different Search Engines will produce different lists, with the pages presented in different orders.

An index has a single purpose: it allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word.

The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table’s effectiveness.

When a person requests a search on a keyword or phrase, the Search Engine software searches the index for relevant information. The software then provides a report back to the searcher with the most relevant web pages listed first.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Live
  • YahooMyWeb
  • E-mail this story to a friend!
Popular Posts

Do You Make These LinksPages Mistakes by admin on October 31st, 2008
Do You Make These LinksPages Mistakes I have seen many that suck, so I completely ignore...

SEO - Pick your keyword and go by admin on May 28th, 2009
SEO - Pick your keyword and go SEO (search engine optimisation) is an industry subject to...

web2.0: Blogs, Photos, Videos and more on Technorati by admin on February 26th, 2009
web2.0: Blogs, Photos, Videos and more on TechnoratiSmashing Magazine http://www.smashingmagazine.com Authority: 9,447. we smash you with the information that will...

Bing's Bang Continues in Week Two by admin on June 20th, 2009
Bing's Bang Continues in Week Two New data from comScore shows...

Microsoft, Google and Apple Being Sued Over Icon Navigation by admin on December 27th, 2008
Microsoft, Google and Apple Being Sued Over Icon Navigation A lawsuit...

Featured Comments

On Link Building Outside the Box by neil on September 8th, 2008
good list of resources thanks.

On An Introduction To Advanced SEO Techniques by website hits on June 13th, 2008
[...] and benefit from the increase in organic search traffic. Search Engine Optimization or SEO ishttp://www.searchengineoptimization-seo.com.au/an-introduction-to-advanced-seo-techniques/Free Web Page Hit Counter...

On So Many Blog Traffic Strategies -- which one should you use? by Latest Wordpress News | 26nov on November 27th, 2008
[...] So Many Blog Traffic Strategies — which one should you use? [...]...

On Learning to Crawl: an Investigation of the Personal Web Crawler by SEO White Hats on October 31st, 2008
Good point! If any of your visitors are interesting in learning more about search engine optimizaiton, check out our...

On Expert SEO Company - 100% result-oriented SEO services, search engine by Sunman Gunman on October 1st, 2008
Good information about seo. We provide off page optimization article submission service for more information visit: http://www.freebiztopics.com/ Thanks for providing information...

Comments

Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!