Internet Tutorials

The World of Search Engines

A search engine is a searchable database of Internet files collected by a computer program, called a crawler, robot, worm, or spider. Indexing is created from the collected files, e.g., title, full text, date last modified, URL, language, etc. Results are ranked by relevance; this will vary among search engines.

In essence, a search engine consists of three components:

This tutorial will cover four types of search engines: general, meta, concept categorizing, and vertical. We'll take these one at a time.

First, let's look at the search engine scene

~ Search engines don't index all the documents on the Web. Far from it. Here are some examples of the type of content (often referred to as the deep web) that usually does not appear in your search engine results:

tip! See the tutorial on The Deep Web for more information on the content generally not found on search engines.

~ On the other hand, some search engines do retrieve a limited amount of content from the deep Web. For example, take a look at this search result from Google on "vegan diet". Notice that the results include videos from YouTube - which, conveniently, Google owns.

vegan diet

~ Because of the potentially large number of pages that can be retrieved by a search, good relevancy ranking is important. Search engines use various criteria to construct a relevancy rating of each search result and will present your results in this order.

Picasso ~ In addition to relevancy ranking, supplementary material may appear with your search results to help you focus on your desired topic and learn more about it. Some search engines are using semantics to present suggested topics, related data, meanings and attributes relating to your search. This is moving search in significant new directions. The screenshot on the right shows useful information retrieved in a Google search about the artist Pablo Picasso.

~ In this era of personalization, search results can be different for different people using the same search engine. For example, Google personalizes your search results based on sites you've selected from previous search results. This feature is called Web History, and you can opt out of it if you wish. This is an important trend to watch as the Web exeperience becomes more individualized to your preferences and needs.

~ real time search is important to the search experience on the Web. General search engines such as Bing and social search engines such as FriendFeed Search are bringing the real time stream to everyone. If it's happening now on the Web, you can search for it.

~ It is helpful to understand that not all aspects of a search engine's technology are revealed to the public. In the world of commercial search engines, trade secrets abound. Help files tend to be general in nature when explaining how the technology works.

~ Don't expect search engines to work perfectly. Sometimes they just don't. If your results look strange, try a different search or a different search engine.

~ Google isn't the only search engine on the Web! Sure, Google is great, but there are other excellent search engines that deserve to be explored. Because Google has become so dominant, it can be easy to overlook useful alternatives. Check out this site's search engine page to explore some of them.

~ And finally... Beware of search results! Some search engines load the top of their results pages with paid listings. These are sites whose owners have paid for high placement. In other words, they are advertisements. Not all search engines do this, and some are more clear than others about what has been paid for and what has not. A good overview of this phenomenon can be found in the 2007 article, "Buying Your Way In: Search Engine Advertising Chart" by Danny Sulliven of Search Engine Watch. If you're interested, read the story.

Now, on to a discussion of various types of search engines.

Partners