Getting The Most Out Of Search Engines

by Gerhard Ruf, gruf@xmission.com

© copyright 1996 - 2004
updated 1 October 2004

 
  1. Overview
        After searching various genealogical databases and collections online, it is often beneficial to search in less well-known sites that also list genealogical information. These sites include any web site that uses static web pages, like most personal web sites.
     
  2. When I use the Internet to search for individual names, I use the following sequence in this approximate order:
    1. familysearch.org
    2. rootsweb.com (including mailing lists and message boards)
    3. ancestry.com
    4. major search engines (typically Google and/or AltaVista)

     
  3. Types of search engines / search sites
    1. Site Specific Search Tools (most major sites have them)
      • each has different rules
    2. Directory or Subject Searches (yahoo.com and/or dmoz.com)
      • included web sites are evaluated by editors
      • least coverage - less than 1% of web
    3. Every Word (General) Search Engines (Google is the most popular)
      • largest coverage - less than 10% of web
    4. Multiple Search Engines/Sites (dogpile.com and/or metacrawler.com)
      • search multiple every word search engines
      • have fewer results than the General Search Engines

     
  4. How do Every Word (General) Search Engines work?
    1. Users or web masters notify the search engine of new pages.
    2. Search engines have 'worms', 'robots' or ‘spiders' that go out and follow all the links (connections) they can find.
    3. They maintain an index of the contents of every page they find.
    4. They periodically check out old connections to make sure they still exist.
    5. A recent survey showed that the roughly 3-4 billion web pages indexed by the largest search engines include less than 15% of the estimated pages in existence

     
  5. How do search engines determine how they rank their results - Relevancy
    1. Each search engine has its own prioritizing or weighting schemes.
    2. They typically include
      • The content of the <TITLE> field of the browser
      • Nearness of search word(s) to the top of the page
      • Search word(s) frequency on the page, except at the end of the page
      • The first word in a search is ranked more important than a later word.
      • Keywords in tags may still be used by some search engines.
      • How many other sites link to this page (unique to Google).
    3. Some search engines allow sites to pay for a higher ranking
    4. Therefore, it is important to use more than one search engine.

     
  6. Which General Search Engine(s) should I use?
     
  7. The basic search features applicable to most search engines - search engine math
    (Using Google and AltaVista as examples)
    1. Upper case and lower case letters in search words
      • Using only lower case characters makes the search case insensitive.
      • Mixing upper and lower case makes the search for that word case sensitive.
            'mcdonald' finds mcdonald, McDonald, MCDONALD, etc.
            ‘McDonald' only finds McDonald
      • Most search engines no longer differentiate case.
            exception: AltaVista IS case sensitive within quotes and Advanced Search.

       
    2. Use multiple words to limit the search.
      • Pages with all the words will be ranked at the top of the results page.
        (In Google and AltaVista only pages with all the words are displayed.)

       
    3. Use your browser's search feature to find one of the search terms in a particular page
      • In Internet Explorer use Edit/Find(on This Page) or <CTRL>f
      • In Netscape use Edit/Find in Page or <CTRL>f
      • If you don't find a search term, the page may have been changed since it was indexed, or the 'term' is 'hidden' on the page.

       
    4. Forcing the inclusion and exclusion of search terms
      • use a + in front of a search term to make it mandatory
            The word or phrase must be somewhere on the page.
            Google requires use of a + for single letter words or simple words like the, of, and, or ... otherwise they are ignored
      • use a - in front of a word to exclude the search term
            The word or phrase may not be anywhere on the page.
            example: +germany +genealogy -history
                +kodak +john +alabama -camera -film
      • no space between the + or - and the word

       
    5. Using wildcards (*) (stemming) (Not supported by Google)
          gold* will find gold, golden, goldy, etc.
          german* will find german, germans, germany, etc.
          genealog* will find genealogy, genealogist, genealogical, etc.
          useful for finding spelling variations in the ending of a name
          typically may not be used in the middle or beginning of the word
       
    6. Searching for phrases
          Phrases are defined with quotation marks
          A few search engines assume multiple words are phrases
          examples: "smith, john"
              "john a. smith"

     
  8. Each search engine has an advanced search page with additional features
    1. Features on the Google Advanced search page include
      • Most of the above features
      • language, file format, date, where on page, specific domains
    2. Features on the AltaVista Advanced search page
      • Most of the above features
      • date, file type, location, boolean searching

     
  9. Additional specialized search commands for the basic search page
    1. Search within the web page title
          Google: intitle, allintitle
          AltaVista & AllTheWeb: title
          example: intitle:text
    2. At a web site (limit search results to a particular web site)
          Google: site
          AltaVista: host, domain
          example: host:buy.edu, domain:uk
    3. URL search (looks for a link to another page that includes the term)
          Google: inurl, allinurl
          AltaVista: url
          example: inurl:usgenweb
    4. dictionary
          Google only: define
          example: define:scurvy

     
  10. Advanced Searching with Boolean Operands (commands)
     
  11. Special Features common to Google and AltaVista
     
  12. Learn More About Using and Evaluating Search Engines at:

Summary

    Search engines provide a powerful method to find specific information on the ever changing World Wide Web. Learning to use a search engine effectively will decrease the amount of time you spend "surfing" for what you are interested in.


Return to Gerhard's list of classes page.

Return to Gerhard and Deon's Home Page.


This page has been accessed times since 10 Jan 04.