Internet Tutorials

The World of Search Engines

A search engine is a searchable database of Internet files collected by a computer program, called a crawler, robot, worm, or spider. Indexing is created from the collected files, e.g., title, full text, date last modified, URL, language, etc. Results are ranked by relevance; this will vary among search engines.

In essence, a search engine consists of three components:

This tutorial will cover four types of search engines: general, meta, concept categorizing, and vertical. We'll take these one at a time.

First, a few tips

~ Google isn't the only search engine on the Web! There are other excellent search engines that deserve to be explored. New search engines are appearing all the time. Because Google has become so dominant, it can be easy to overlook useful alternatives. Check out this site's search engine page to explore some of them.

~ Search engines don't index all the documents on the Web. Far from it. Here are some examples of the type of content (often referred to as the deep web) that usually does not appear in your search engine results:

tip! See the tutorial on The Deep Web for more information on the content generally not found on search engines.

As with most things, there are exceptions to these rules. Some search engines do retrieve a limited amount of content from the deep Web. For example, take a look at this search result from Google on "vegan diet". Notice that the results include videos from YouTube - which, conveniently, Google owns.

vegan diet

~ Because of the potentially large number of pages that can be retrieved by a search, good relevancy ranking is important. Search engines use various criteria to construct a relevancy rating of each search result and will present your results in this order.

~ In this era of personalization, search results can be different for different people using the same search engine. For example, Google personalizes your search results based on sites you've selected from previous search results. This feature is called Web History, and you can opt out of it if you wish. This is an important trend to watch as the Web exeperience becomes more individualized to your preferences and needs.

~ real time search is important to the search experience on the Web. General search engines such as Bing and social search engines such as OneRiot are bringing the real time stream to everyone. If it's happening now on the Web, you can search for it.

~ It is helpful to understand that not all aspects of a search engine's technology are revealed to the public. In the world of commercial search engines, trade secrets abound. Help files tend to be general in nature when explaining how the technology works.

~ Don't expect search engines to work perfectly. Sometimes they just don't. If your results look strange, try a different search or a different search engine.

~ And finally... Beware of search results! Some search engines load the top of their results pages with paid listings. These are sites whose owners have paid for high placement. In other words, they are advertisements. Not all search engines do this, and some are more clear than others about what has been paid for and what has not. A good overview of this phenomenon can be found in the 2007 article, "Buying Your Way In: Search Engine Advertising Chart" by Danny Sulliven of Search Engine Watch. If you're interested, read the story.

Now, on to a discussion of various types of search engines.

Partners