Back to Internet Tutorials

The Deep Web

The deep Web has gotten a lot of press in recent years. The Web is becoming a complex entity that contains information from a variety of source types. It is much more than fixed Web pages. In fact, the part of the Web that is not fixed, and is served dynamically "on the fly," is far larger than the fixed documents that many associate with the Web. Some people incorrectly refer to this content as the "invisible Web," for reasons that will be explained below.

When we refer to the deep Web, we are usually talking about the following:

The phenomenon of databases on the Web has been talked about for years, before the terms "invisible Web" or "deep Web" were coined. People sometimes referred to them as specialty databases, subject-specific databases, virtual libraries, and other similar terms. As Web technology develops and greater amounts of information are mounted on the Web, these databases take on primary importance as information finding tools.

The concept of the deep Web is becoming more complex as search engines such as Google have found ways to integrate deep Web content into their centralized search function. This includes everything from airline flights to documents in Word format. However, even a search engine as innovative as Google provides access to only a very small part of the deep Web.

Terminology

Why is this content referred to as the "invisible Web"? This is because the content of databases rarely shows up in a search engine result. Search engine spiders cannot or will not go inside database tables and extract the data. Database content is therefore "invisible" to them.

However, the term "invisible Web" is a poor choice for these reasons:

  1. The term is very search engine-centric. It assumes that the only way to find information on the Web is to consult a search engine. If the information cannot be found on a search engine, you're out of luck. This is simply not the case.
  2. There is no such thing as recorded information that is invisible. Some information may be more of a challenge to find than others, but this is not the same as invisibility.
  3. Informational databases have been available for years. Many of us are familiar with a library's collection of Web-based e-journals and databases. We use online catalogs, which are databases of a library's holdings. No one has ever called this information a part of the "invisible library." These are simply databases whose content is available through user query. Like a library, the Web contains information of different types that is stored and retrieved in different ways.
  4. The content of search engines on the Web is itself stored in databases and available only through user query. Shouldn't we call this invisible, too? We're labelling as invisible something that is available only through user query (the invisible Web) because it isn't accessible from within something else that is also available only through user query (search engines). The logic of this terminology just doesn't hold up.

A company called BrightPlanet has coined the term "deep Web" to describe the phenomenon of searchable databases on the Web. (The static Web is referred to as the "surface Web.") This is much better since database content is visible with the appropriate search and retrieval technology.

A Few Tips for Dealing with the Deep Web

When dealing with the deep Web, keep these points in mind:

Sources of Deep Web Content

As noted above, deep Web sites can be located in subject directories and search engines. In addition, deep Web content is available on search engine sites as featured content such as news, video, images, etc.

If you're interested in this topic, take a look at Deep Web Technologies. This company has developed a few databases, including

The number of deep Web sources is endless. The Online Education Database maintains a nice sample list of deep Web resources.

The Future of the Deep Web

The lines between seach engine content and the deep Web have begun to blur as search services are providing access to part or all of once-restricted content. These services are providing free search of the content of books and scholarly papers. Google Book Search, Google Scholar, Live Search Academic and other up-and-coming services are examples of this phenomenon.

Generally speaking, if a book is out of copyright, you can view the text in its entirety. The issue of full text availability is complex, as Google, for example, often restricts access to the full text of out-of-copyright books when publishers with which it has agreements are selling them. Access to scholarly papers is also tricky. Some papers are posted on preprint or postprint archives, on Open Access journals, or on personal Web sites. When these show up in search engine results, full text access can be achieved. In other cases, the search is free but you must pay to access the content.

In essence, an increasing amount of deep Web content, especially scholarly content, is opening up to free search. As more and more publishers and libraries make agreements with the big search engines, more content will be searchable from central locations. Access to this content is a mixed bag. It may be that the future of the deep Web will be defined less by the opportunity for search than by access fees or other types of authentication.

Return to Top

Updated: 5 July 2007

Send comments to