Getting The Most Out Of Search Engines
© copyright 1996 - 2004
updated 1 October 2004
- Overview
After searching various genealogical databases
and collections online, it is often beneficial to search in
less well-known sites that also list genealogical information.
These sites include any web site that uses static web pages,
like most personal web sites.
- When I use the Internet to search for individual names, I use the
following sequence in this approximate order:
- familysearch.org
- rootsweb.com
(including mailing lists and message boards)
- ancestry.com
- major search engines (typically
Google and/or
AltaVista)
- Types of search engines / search sites
- Site Specific Search Tools (most major sites have them)
- Directory or Subject Searches
(yahoo.com and/or dmoz.com)
- included web sites are evaluated by editors
- least coverage - less than 1% of web
- Every Word (General) Search Engines (Google is the most popular)
- largest coverage - less than 10% of web
- Multiple Search Engines/Sites
(dogpile.com and/or
metacrawler.com)
- search multiple every word search engines
- have fewer results than the General Search Engines
- How do Every Word (General) Search Engines work?
- Users or web masters notify the search engine of new pages.
- Search engines have 'worms', 'robots' or ‘spiders' that go out
and follow all the links (connections) they can find.
- They maintain an index of the contents of every page they find.
- They periodically check out old connections to make sure they
still exist.
- A recent survey showed that the roughly 3-4 billion web pages
indexed by the largest search engines include less than 15%
of the estimated pages in existence
- How do search engines determine how they rank their results - Relevancy
- Each search engine has its own prioritizing or weighting schemes.
- They typically include
- The content of the <TITLE> field of the browser
- Nearness of search word(s) to the top of the page
- Search word(s) frequency on the page, except at the end of the page
- The first word in a search is ranked more important than a later word.
- Keywords in tags may still be used by some search engines.
- How many other sites link to this page (unique to Google).
- Some search engines allow sites to pay for a higher ranking
- Therefore, it is important to use more than one search engine.
- Which General Search Engine(s) should I use?
- The most popular search engines by Nielsen NetRatings (May 2004) are
(My own search results on 1 Oct 2004 are listed beside them.)
| Site
| Nielsen Rank
| genealogy
| "family history"
| map
|
| Google
| #1 (42%)
| 12,900,000
| 2,420,000
| 190,000,000 |
|
http://www.google.com/ |
| Yahoo
| #2 (38%)
| 20,300,660
| 8,060,726
| 275,000,000 |
|
http://www.yahoo.com/ |
| MSN Search (provided by Yahoo)
| #3 (27%)
| 4,360,145
| 1,643,634
| 56,587,804 |
|
http://www.msnsearch.com/ |
| AOL (provided by Google)
| #4 (14%)
| 344,001
| 68,667
| 5,220,001 |
|
http://www.aol.com/ |
| AltaVista (provided by Yahoo)
| #13 (3%)
| 20,300,660
| 8,060,726
| 275,000,000 |
|
http://www.altavista.com/ |
- The basic search features applicable to most search engines - search engine math
(Using Google and AltaVista as examples)
- Upper case and lower case letters in search words
- Using only lower case characters makes the search case insensitive.
- Mixing upper and lower case makes the search for that word case
sensitive.
'mcdonald' finds mcdonald, McDonald, MCDONALD, etc.
‘McDonald' only finds McDonald
- Most search engines no longer differentiate case.
exception: AltaVista IS case sensitive within quotes
and Advanced Search.
- Use multiple words to limit the search.
- Pages with all the words will be ranked at the top of the results page.
(In Google and AltaVista only pages with all the words are displayed.)
- Use your browser's search feature to find one of the search terms in a particular page
- In Internet Explorer use Edit/Find(on This Page) or <CTRL>f
- In Netscape use Edit/Find in Page or <CTRL>f
- If you don't find a search term, the page may have been changed since it
was indexed, or the 'term' is 'hidden' on the page.
- Forcing the inclusion and exclusion of search terms
- use a + in front of a search term to make it mandatory
The word or phrase must be somewhere on the page.
Google requires use of a + for single letter words or simple words
like the, of, and, or ... otherwise they are ignored
- use a - in front of a word to exclude the search term
The word or phrase may not be anywhere on the page.
example: +germany +genealogy -history
+kodak +john +alabama -camera -film
- no space between the + or - and the word
- Using wildcards (*) (stemming) (Not supported by Google)
gold* will find gold, golden, goldy, etc.
german* will find german, germans, germany, etc.
genealog* will find genealogy, genealogist, genealogical, etc.
useful for finding spelling variations in the ending of a name
typically may not be used in the middle or beginning of the word
- Searching for phrases
Phrases are defined with quotation marks
A few search engines assume multiple words are phrases
examples: "smith, john"
"john a. smith"
- Each search engine has an advanced search page with additional features
- Features on the Google Advanced search page include
- Most of the above features
- language, file format, date, where on page, specific domains
- Features on the AltaVista Advanced search page
- Most of the above features
- date, file type, location, boolean searching
- Additional specialized search commands for the basic search page
- Search commands are separated from the search term by a colon(:)
- Search within the web page title
Google: intitle, allintitle
AltaVista & AllTheWeb: title
example: intitle:text
- At a web site (limit search results to a particular web site)
Google: site
AltaVista: host, domain
example: host:buy.edu, domain:uk
- URL search (looks for a link to another page that includes the term)
Google: inurl, allinurl
AltaVista: url
example: inurl:usgenweb
- dictionary
Google only: define
example: define:scurvy
- Advanced Searching with Boolean Operands (commands)
- use AND(&), OR(vertical bar), AND NOT(!), NEAR(~) with words and phrases
- operands are not case sensitive but words and phrases are
- combined words are treated as phrases (no quotation marks required)
- Google only supports the OR operand
- AltaVista only supports these operands on the Advanced Search screen
- BOOLEAN Search Terms (Operands) and what they do
AND
similar to + in regular searching
example: family history AND purrington
OR
similar to not using any notation in regular searching
AND NOT
similar to - in regular searching
example: (genealogy AND kodak) AND NOT film AND NOT camera*
NEAR
no equivalent in regular searching
finds words or phrases within 10 words of each other (at AltaVista)
example: gerhard NEAR ruf (will find it with or without middle names)
()
use parenthesis to group operations
example: (ruf OR ruff OR roufe OR rouffe) NEAR (gerhard OR gerhard OR gerhardt)
- Special Features common to Google and AltaVista
- Protect yourself from finding pornographic or other objectionable sites.
Google: go to Preferences, make selection in Safe Search Filtering
AltaVista: go to Settings, select Family Friendly Filter (password protection)
- Toolbar (presents search features in browser - only works in Internet Explorer)
Google: go to Services & Tools, select Google Toolbar (also blocks popups)
AltaVista: go to Toolbar (includes translation)
- Translation of web sites
Google: go to Language Tools
AltaVista: go to Translate, or use AltaVista Toolbar
- Learn More About Using and Evaluating Search Engines at:
Summary
Search engines provide a powerful method to find specific
information on the ever changing World Wide Web. Learning to use a search engine
effectively will decrease the amount of time you spend "surfing" for what you are
interested in.
Return to Gerhard's list of classes page.
Return to Gerhard and Deon's Home Page.
This page has been accessed
times since 10 Jan 04.