Introduction to Search Engines
You've heard the term "Search Engines"
so many times, but do you know what it
really means? Or better yet, do you know how to
use the search engines to make your Web site more
visible to potential customers? First, you must
understand how search engines work before you
can decide how to use them to your advantage.
Search engines are a primary way that people
find information on the Internet. (Sometimes search
engines are equated to the yellow or white pages
of the Web). Users request information from search
engines and in return they receive a list of possible
URLs that match their request. There are many
kinds of search engines that provide Web site
information based on many different kinds of
criteria. A Web site with a good search engine listing may
see a dramatic increase in traffic.
Everyone
wants that good listing. Unfortunately, many Websites appear poorly in search engine rankings
or may not be listed at all because they fail
to consider how search engines work.
How Search Engines Work
The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.
If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.
The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.
Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.
How Search Engines Rank Web Pages
Search for anything using your favorite crawler-based search engine. Nearly instantly, the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first.
Of course, the search engines don't always get it right. Non-relevant pages make it through, and sometimes it may take a little more digging to find what you are looking for. But, by and large, search engines do an amazing job.
As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, 'travel.' They’re going to look at you with a blank face."
OK -- a librarian's not really going to stare at you with a vacant expression. Instead, they're going to ask you questions to better understand what you are looking for.
Unfortunately, search engines don't have the ability to ask a few questions to focus your search, as a librarian can. They also can't rely on judgment and past experience to rank web pages, in the way humans can.
So, how do crawler-based search engines go about determining relevancy, when confronted with hundreds of millions of web pages to sort through? They follow a set of rules, known as an algorithm. Exactly how a particular search engine's algorithm works is a closely-kept trade secret. However, all major search engines follow the general rules below.
Location, Location, Location...and Frequency
One of the the main rules in a ranking algorithm involves the location and frequency of keywords on a web page. Call it the location/frequency method, for short.
Remember the librarian mentioned above? They need to find books to match your request of "travel," so it makes sense that they first look at books with travel in the title. Search engines operate the same way. Pages with the search terms appearing in the HTML title tag are often assumed to be more relevant than others to the topic.
Search engines will also check to see if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning.
Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant than other web pages.
Top of Page
Major Search
Engines
Why are these considered to be "major" search engines? Because they are either well-known or well-used.
For webmasters, the major search engines are the most important places to be listed, because they can potentially generate so much traffic.
For searchers, well-known, commercially-backed search engines generally mean more dependable results. These search engines are more likely to be well-maintained and upgraded when necessary, to keep pace with the growing web.
All search engines have the basic parts described
above, but there are differences in how these
parts are tuned. That is why the same search on
different search engines often produces different
results.
How do I get my
site considered for a listing with a search engine?
Depending on the search engine, there are
two common ways they can discover your site:
- You hand submit a request for your Web site to
be included or reviewed in their index.
- Be referenced by another Web site that is
already listed in a search engine's index.
Search engines discover your site when they crawl
or spider the Web. This means that they automatically
read Web page text and then follow most of the
normal kinds of HTML links. They rely on Web site
text to record the nature and content of the site.
What can I do
to be included in a search engine's index?
Although there are many different types of
search engines, there are ways to refine your
Web site so that it is picked up or ranked well
by search engines in general. These key factors
include:
- Choosing your keywords carefully.
- Thoroughly applying the keywords in the text
of the site.
- Testing the site's rankings and updating
the site often.
- Never fool (spam) the search engines.
- Nothing replaces quality content!
- Submitting your site to the proper engines.
Top of Page
How do I choose
my keywords?
Many automated search engines determine the
relevancy of your Web page based on keywords used
in the Web site text. Keywords are the two or
three words that you use to identify your site
to users. They are also the words in the HTML
tags --specifically the tags--used in your document.
Meta tags apply to an entire document, and there
are many of them. The ones most useful for search
engines are the description and keywords tags.
The description tag displays a description of
your page when it is relevant to a search. The
keywords tag provides keywords for the search
engine to associate with your page. For example,
here is a simple Web page header:
| <HEAD>
<TITLE> Fox Gardens </TITLE>
<META name="keywords" content="landscaping,
oriental landscaping, oriental landscaping
services, landscape, oriental gardening,
gardening services">
</HEAD> |
A search engine that displays
your site would show:
Fox Gardens - An oriental landscaping company
The words you included in the keywords tag are
a subset of the words you might choose to use.
In this example, landscaping, oriental landscaping,
and oriental landscaping services are samples
of keywords. Be specific as you select them, choosing
the keyword landscaping alone, for example, does
not narrow the search enough for you to be well
ranked if the user searches on oriental landscaping.
Choosing keywords is critical--one word can completely
change the relevance ranking of your site.
To find out the best keywords to use, go to
your desired search site and search on the terms
you think your desired audience will use.
Top of Page
How do I apply
the keywords to my site?
The list below summarizes many of the known
tips on using keywords to boost the relevancy
of your Web page to search engines.
- The most important place to use your keywords
is in the title of your Web page. Some search
engines don't search the text of the file, nor
do they read meta tags.
- Provide relevant information and keywords
on the first page of your document.
- If your title or page is a graphic, make sure
that you place the title in the tag, since
search engines cannot interpret words in graphics
nor do they follow graphic links. Also use keyword
meta tags for pages with sparse text. Not all
engines review the content in these tags, but
this increases the chances of your keywords
being caught by the engine.
- Apply the keywords in keyword meta-tags, including
variations on the words, misspellings, plurals,
and any other combinations. For example,use
photograph, photography, photo etc.
- Use lower case for meta tag keywords. While
some search engines are case sensitive, this
will produce the most results without tripping
a search engines spam sensor.
- Repeat the keywords in your content frequently
and in various ways. Many of the engines weight
the importance of your keywords based upon the
frequency of their use. It is especially important
that the first paragraphs of your document contain
your title's keywords.
- Use a clear description meta tag. (The description
meta tag is used by search engines to describe
your site in the listing.) For example Rare
old photographs for sale.
- Some engines ignore the comment tag, others
include it. So, using your keywords in your
comments may be useful. Avoid repeatedly inserting
your keywords in comments, however, as that
would considered spamming or trying to fool
the search engine.
- Search engines create relevancy ratings based
on when keywords occur in your site. If you
use tables your keywords will appear later in
your documents rather than earlier. Use of frames
also impacts search engine ratings.
- Keywords that are embedded in JavaScript
are usually ignored by search engines. Graphics
and image maps are ignored by search engines.
- Graphic only links can't be followed by the
search engine, so make sure you've provided
an alternative route for the engine. Some engines
submit a simple text site map to search engines
to deal with this issue.
- Symbols and dynamic content are not followed
when a site is spidered.
- If you'd like detailed information on how
each search engine uses keywords, refer to the
SearchEngineWatch website (www.searchenginewatch.com)
for more details.
- Include your site identification URL as the
last keyword in your meta tags.
- Quality counts! The quality and nature of
the content you provide makes all of the difference
to the audience you draw to your site. Some
Websites are discovered and listed by search
engines before they have reviewed their submission
requests. They are discovered because other
sites of high quality content refer them to
your site.
Top of Page
How do I test
my site's ranking?
The amount of time that it takes to index your
site depends on the search engine. The quickest
response time for your site to be indexed is about
two days. Two days is extremely fast when you
consider the enormous number of Websites that
are submitted to search engines every day. Many
of the other engines can take 2-4 weeks to appear,
if your site appears at all. Some of the search
engines offer you the opportunity to pay to have
your site considered more quickly than the standard
time period (called express submissions).
Search engines routinely update and change their
criteria for listing. They also regularly discard
sites from their index, so they require constant
monitoring. Keywords for your industry also change
often. We recommend that you check for the most
popular or appropriate keywords for your site
and resubmit about twice a month. This can benefit
your site since fresh submissions are often higher
in ranking than old ones. We suggest that you
monitor every week or two and resubmit your pages
at minimum every time you make significant changes
to your site.
Can I fool search
engines? (spamming)
Web designers have developed many ways to
try to fool search engines into ranking their
sites well and including them in their site index.
Some of these methods include repeating keywords
over and over again in the meta tags for the site
(called stuffing), using colored text on the same
color background for key words and content, hiding
text using cgi, using very tiny fonts, etc. These
tactics are designed to bombard the search engine
and deceive them about the quality and nature
of the content.
These actions can seriously degrade the value
of search engines. Search engines are savvy to
these tactics and if they catch you engaging in
these practices they will disallow your submissions
to their index.
Where do I submit my
site?
The search engines below are all excellent choices to start with.
Google
http://www.google.com
Voted four times Most Outstanding Search Engine by Search Engine Watch readers, Google has a well-deserved reputation as the top choice for those searching the web. The crawler-based service provides both comprehensive coverage of the web along with great relevancy. It's highly recommended as a first stop in your hunt for whatever you are looking for.
Google provides the option to find more than web pages, however. Using on the top of the search box on the Google home page, you can easily seek out images from across the web, discussions that are taking place on Usenet newsgroups, locate news information or perform product searching. Using the More link provides access to human-compiled information from the Open Directory (see below), catalog searching and other services.
Google is also known for the wide range of features it offers, such as cached links that let you "resurrect" dead pages or see older versions of recently changed ones. It offers excellent spell checking, easy access to dictionary definitions, integration of stock quotes, street maps, telephone numbers and more. See Google's help page for an entire rundown on some of these features. The Google Toolbar has also won a popular following for the easy access it provides to Google and its features directly from the Internet Explorer browser.
In addition to Google's unpaid editorial results, the company also operates its own advertising programs. The cost-per-click AdWords program places ads on Google as well as some of Google's partners. Similarly, Google is also a provider of unpaid editorial results to some other search engines. For a list of major partnerships, see the Search Providers Chart.
Google was originally a Stanford University project by students Larry Page and Sergey Brin called BackRub. By 1998, the name had been changed to Google, and the project jumped off campus and became the private company Google. It remains privately held today.
Yahoo
http://www.yahoo.com
Launched in 1994, Yahoo is the web's oldest "directory," a place where human editors organize Websites into categories. However, in October 2002, Yahoo made a giant shift to crawler-based listings for its main results. These came from Google until February 2004. Now, Yahoo uses its own search technology. Learn more in this recent review from our SearchDay newsletter, which also provides some updated submission details.
In addition to excellent search results, you can use tabs above the search box on the Yahoo home page to seek images, Yellow Page listings or use Yahoo's excellent shopping search engine. Or visit the Yahoo Search home page, where even more specialized search options are offered.
The Yahoo Directory still survives. You'll notice "category" links below some of the sites lists in response to a keyword search. When offered, these will take you to a list of Websites that have been reviewed and approved by a human editor.
It's also possible to do a pure search of just the human-compiled Yahoo Directory, which is how the old or "classic" Yahoo used to work. To do this, search from the Yahoo Directory home page, as opposed to the regular Yahoo.com home page. Then you'll get both directory category links ("Related Directory Categories") and "Directory Results," which are the top web site matches drawn from all categories of the Yahoo Directory.
Sites pay a fee to be included in the Yahoo Directory's commercial listings, though they must meet editor approval before being accepted. Non-commercial content is accepted for free. Yahoo's content acquisition program also offers paid inclusion, where sites can also pay to be included in Yahoo's crawler-based results. This doesn't guarantee ranking, Yahoo promises. The CAP program also bring in content from non-profit organizations for free.
Like Google, Yahoo sells paid placement advertising links that appear on its own site and which are distributed to others. Yahoo purchased Overture in October 2003.
Overture was formerly called GoTo until late 2001. More about it can be found on the Paid Listings Search Engines page. Overture purchased AllTheWeb (see below) in March 2003 and acquired AltaVista (see below) in April 2003. Now Yahoo owns these, gained as from its purchase of Overture.
Technology AltaVista and AllTheWeb was combined with that of Inktomi, a crawler-based search engine that grew out UC Berkeley and then launched as its own company in 1996, to make the current Yahoo crawler. Yahoo purchased Inktomi in March 2003.
Ask
http://www.ask.com
Ask Jeeves initially gained fame in 1998 and 1999 as being the "natural language" search engine that let you search by asking questions and responded with what seemed to be the right answer to everything.
In reality, technology wasn't what made Ask Jeeves perform so well. Behind the scenes, the company at one point had about 100 editors who monitored search logs. They then went out onto the web and located what seemed to be the best sites to match the most popular queries.
In 1999, Ask acquired Direct Hit, which had developed the world's first "click popularity" search technology. Then, in 2001, Ask acquired Teoma's unique index and search relevancy technology. Teoma was based upon the clustering concept of subject-specific popularity.
Today, Ask depends on crawler-based technology to provide results to its users. These results come from the Teoma algorithm, now known as ExpertRank.
Strongly Consider
The search engines below are other good choices to consider when searching the web.
AllTheWeb.com
http://www.alltheweb.com
Powered by Yahoo, you may find AllTheWeb a lighter, more customizable and pleasant "pure search" experience than you get at Yahoo itself. The focus is on web search, but news, picture, video, MP3 and FTP search are also offered.
AllTheWeb.com was previously owned by a company called FAST and used as a showcase for that company's web search technology. That's why you sometimes may sometimes hear AllTheWeb.com also referred to as FAST or FAST Search. However, the search engine was purchased by search provider Overture (see below) in late April 2003, then later become Yahoo's property when Yahoo bought Overture. It no longer has a connection with FAST.
AOL Search
http://www.aol.com
AOL Search provides users with editorial listings that come Google's crawler-based index. Indeed, the same search on Google and AOL Search will come up with very similar matches. So, why would you use AOL Search? Primarily because you are an AOL user. The "internal" version of AOL Search provides links to content only available within the AOL online service. In this way, you can search AOL and the entire web at the same time. The "external" version lacks these links. Why wouldn't you use AOL Search? If you like Google, many of Google's features such as "cached" pages are not offered by AOL Search.
HotBot
http://www.hotbot.com
HotBot provides easy access to the web's three major crawler-based search engines: Yahoo, Google and Teoma. Unlike a meta search engine, it cannot blend the results from all of these crawlers together. Nevertheless, it's a fast, easy way to get different web search "opinions" in one place.
HotBot's "choose a search engine" interface was introduced in December 2002. However, HotBot has a long history as a search brand before this date.
HotBot debuted in May 1996, it gained a strong following among serious searchers for the quality and comprehensiveness of its crawler-based results, which were provided by Inktomi, at the time. It also caught the attention of experienced web users and techies, especially for the unusual colors and interface it continues to sport today.
HotBot gained more notoriety when it switched over to using Direct Hit's "clickthrough" results for its main listings in 1999. Direct Hit was then one of the "hot" search engines that had recently appeared. Unfortunately, the quality of Direct Hit's results couldn't match those of another "hot" player that had debuted at the same time, Google. HotBot's popularity began to drop.
Even worse, HotBot also suffered by being owned by Lycos (now Terra Lycos). Lycos had acquired HotBot when it purchased Wired Digital in October 1998. Lycos failed to make search a priority on its flagship Lycos site as well as HotBot through much of 1999 and 2000, as it focused instead on adding "portal" features. The company refocused on search in late 2001, making significant improvements to the Lycos site and, as noted, reworked the HotBot site at the end of 2002.
While search engines are still a primary method
for drawing traffic, don't forget traditional
media, newsgroup postings, web directories, and
alternative forms that can sometimes be far more
effective than search engines. You can also use
personal communications to lists for your industry/area,
this kind of approach produces less traffic but
the audience is pre-qualified.
Top of Page
Search Engine Optimization (SEO)
Search engine optimization (SEO) is the process of improving the volume and quality of traffic to a web site from search engines via "natural" ("organic" or "algorithmic") search results. Usually, the earlier a site is presented in the search results, or the higher it "ranks", the more searchers will visit that site. SEO can also target different kinds of search, including image search, local search, and industry-specific vertical search engines.
As a marketing strategy for increasing a site's relevance, SEO considers how search algorithms work and for what people search. SEO efforts may involve a site's coding, presentation, and structure, as well as fixing problems that could prevent search engine indexing programs from fully spidering a site. Other, more noticeable efforts may include adding unique content to a site, ensuring that content is easily indexed by search engine robots, and making the site more appealing to users. Another class of techniques, known as "Black hat" SEO or spamdexing, use methods such as link farms and keyword stuffing that tend to harm search engine user experience. Search engines look for sites that employ these techniques and may remove their listings.
The acronym "SEO" can also refer to "search engine optimizers", a term adopted by an industry of consultants who carry out optimization projects on behalf of clients, and by employees who perform SEO services in-house. Search engine optimizers may offer SEO as a stand-alone service or as a part of a broader marketing campaign. Because effective SEO may require changes to the HTML source code of a site, SEO tactics may be incorporated into web site development and design. The term "search engine friendly" may be used to describe web site designs, menus, content management systems and shopping carts that are easy to optimize.
Top of Page
Content courtesy of SearchEngineWatch, one of the most comprehensive Web sites providing detailed information on existing search engines.
|