Back to Internet Tutorials

Checklist of Internet Research Tips

  1. The Internet is a self-publishing medium. It is not a library of evaluated publications selected by professionals. Rather, the Internet is a bulletin board containing everything from the definitive to the spurious. Everything, everything must be analyzed for its appropriateness for research use. For guidelines on how to do this, see Evaluating Web Content.
  2. Before you select a search tool, always think about your topic and what you are trying to find. For a few ideas on this strategy, see Getting Started: Selecting a Tool for Your Search. Once you begin your research, be sure to try out a handful of sites. Don't rely on a single site or type of site.
  3. Don't just Google everything! Google is great, but there are other useful tools on the Web, too. Google has become so popular that many people use this tool exclusively, and miss out on others that might be more useful for their particular search. Think again about the approach suggested in point #2 before starting your next search.
  4. Three major resources for locating Internet materials are the subject directory, the search engine, and content on the deep Web. These are useful for different types of queries. Be sure you understand the differences.

SUBJECT DIRECTORY

For general, research oriented queries, involving an exploration of a topic, and when you want to view sites often recommended by experts, use a subject directory.

Definition: A subject directory is a service that offers a collection of links to Internet resources submitted by site creators or evaluators and organized into subject categories. Directory services use selection criteria for choosing links to include, though the selectivity varies among services. Most directories are searchable.

When using directories, keep in mind that:

INFOMINE, from the University of California, is a good example of an academic subject directory. Yahoo is a good example of a commercial portal--but its directory should never be used for serious research. A more complete list of both types of directories may be found on the page Internet Subject Directories.

SEARCH ENGINES

For targeted, multi-concept, and sometimes general queries, use a search engine.

Definition: A search engine is a searchable database of Internet files collected by a computer program (called a wanderer, crawler, robot, worm, spider). Indexing is created from the collected files, e.g., title, full text, size, URL, etc. There is no selection criteria for the collection of files, though evaluation can be applied to the ranking of results.

A search engine might well be called a search engine service or a search service. As such, it consists of three components:

Google is a famous example of a search engine. A more complete list may be found on the page Internet Search Engines.

DEEP WEB

For targeted queries, when you are looking for non-textual information, use the DEEP WEB.

Definition: The deep Web consists of information stored in searchable databases mounted on the Web. Information stored in these databases is accessible by user query. These databases usually search a targeted topic or aspect of a topic, though entire Web sites may be contained within a database. Search engine spiders cannot or will not index this information.

The deep Web also consists of multimedia and image files, and files created in non-standard file types such as Portable Document Format (PDF). Many search services offer separate search options for locating these files. AlltheWeb, AltaVista and MSN Search are just a few examples of services that offer specialized media searches, while Google integrates searches of PDF and other non-HTML files into its general search service.

When dealing with the deep Web, keep in mind that:

  1. Yahoo is one of the most popular sites on the Web. It is one of the Web's largest commercial portals. When you search Yahoo, you get results from the general Web and also from its own directory. The Yahoo directory is not a reliable or adequate research tool and should not be used for this purpose. Beware of the drawbacks of the Yahoo directory:
  1. It is very helpful to understand the principles of Boolean search logic when using a search engine on the Web. This search logic is manifested in three distinct ways on Web search engines. Review Boolean Searching on the Internet.
  2. Other search strategies are also useful to examine in order to make accurate use of Web search engines. Be sure to check these out.
  3. When you enter more than one word in a Web search engine, the space between the words has a logical meaning that directly affects your results. This is known as the default syntax. For example:

In Google, a search on the words

birds     migration

means that you will get back documents that contain both the words birds and migration. This is because the space between the words defaults to the Boolean AND. Most search tools nowadays default to AND logic.

If you wish to use Boolean OR logic to search for something like

"global warming"     "greenhouse effect"

you will need to select OR logic from a search template or use an advanced search option that gives you additional choices.

  1. When using Web search engines, a de facto search language has emerged especially for basic search (i.e., main screen) interfaces. When in doubt, use the following syntax:
  1. Search engines offer numerous features that help you hone in on what you want. For a review of some of these features, and the search engines that support them, see How to Choose a Search Engine or Directory.
  2. Search engines return results in a schematic order. Most search engines use various criteria to construct a term relevancy rating of each hit and will present your search results in this order. Criteria can include: search terms in the title, URL, first heading, HTML META tag; number of times search terms appear in the document; search terms appearing early in the document; search terms appearing close together; etc. This is known as "on the page" ranking and applies to many long-standing first generation search services. There aren't too many search engines any more that use this technique exclusively, but most search engines do incorporate this technique into the ranking of its results.
  3. One of the most interesting developments in search engine technology is the organization of search results by peer ranking, concept, site and domain, rather than by term relevancy. This type of ranking looks at "off the page" information to determine the order of your search results. Search engines that employ this alternative may be thought of as second generation search services. For example:

A more detailed look at second generation search services may be found in the tutorial Second Generation Searching on the Web.

  1. Take advantage of second generation tools that return results in a horizontal presentation. Most search tools return results in one long, vertical list. In contrast to this, there is a group of search tools that use concept processing to return results in a horizontal organization. With these tools, you can first review concept categories retrieved by your search before examining the results within particular categories. This can make it easier to zero in on the aspects of your topic that interest you. Examples of these tools include Query Server and Clusty.
  2. Don't be impressed--or even necessarily worried-- by a large number of hits in response to a well-formulated search. Often multiple pages are returned from a single site because they all contain your search terms. AltaVista and AlltheWeb are among those that avoid this by a technique called results grouping, whereby all the results from one site are clustered together into one result. You are then given the opportunity to view all the retrieved pages from that site if you choose. With these engines, you may get a smaller number of results from a search, but each result is coming from a different site.
  3. If you have too many search results, or results that are not relevant:
  1. If you have too few search results:
  1. Meta search engines simultaneously search multiple search engines. They are also referred to as parallel search engines, multithreaded search engines, or mega search engines. These are useful when:

Most meta engines return a single list of results, often with the duplicate hits removed. The engine retrieves a certain maximum number of documents from the individual engines it has searched, cut off after a certain point as the search is processed. The cut-off may be determined by the number of documents retrieved, or by the amount of time the meta engine spends at the other sites. Some of these services give the user a certain degree of control over these factors. All of this has two implications:

The better meta search engines remove duplicate files and give you some information along with the document title. To see a list of meta search engines, visit Internet Search Engines.

  1. Keep in mind that search engines do not index all the documents available on the Web. For example, most search engines cannot index files to password-protected sites, behind firewalls, or configured by the host server to be left alone. Still other Web pages may not be picked up if they are not linked to other pages, and are therefore missed by a search engine spider as it crawls from one page to the next. Search engines rarely contain the most recent documents posted to the Internet; do not look for yesterday's news on a search engine.
  2. Finally, watch for converging content. Many well-known sites now contain information from an array of sources. This can increase the usefulness of search sites, but also create confusion in terms of the information source. For example:

Return to Top

Updated: 6 February 2008

Send comments to