Know Your Search Engine: Part I To Find The Right Results, Ask The Right Questions

From the Oct. 2007 Issue

As a point of information to those whose eyes have been closed to Internet
technology over the past few years, “pushing” information is losing
ground to “pulling” information. In short, companies used to deploy
e-mail newsletters and traditional websites that pushed information out to as
many people as they possibly could in hopes of it reaching their intended audiences.
Or people would simply visit their list of favorite websites every day to see
if something had changed or had been updated.

Information pushing still goes on too frequently, of course — SPAM and
other unsolicited commercial e-mail (UCE) will probably never stop. But savvy
Internet marketers and information providers have been moving toward technologies
like RSS that allow users to pull updated content from websites only on the
subjects they want. If any of you need more information on RSS and newsfeed
syndication, ask your local 10 year-old. Or you can go online and search for
the answer because as much as these feeds have helped us, they cannot replace
the bread and butter of the Internet, which is the resolution of an immediate
question or query by searching.

Soon
after the first commercial websites started appearing in 1994, two smart guys
from Stanford realized that a directory of sorts would be beneficial, so they
started Yahoo!, which went public in 1996, two-and-a-half years before Google
would be incorporated. Many other search engines and portals soon appeared on
the Internet, including MSN, HotBot, Excite and AltaVista. AOL, of course, tried
launching its own kind of proprietary Internet, which eventually failed, and
then tried to rebrand as a search portal, which is failing, too. Currently,
Google has about 64 percent of the search market, with Yahoo! at 22 percent,
MSN’s Live Search at 7 percent and ASK.com at about 3 percent. (Information
for July 2007 from HitWise, an Experian company.)

How Search Engines Work
Just as with any product, each search engine brand has its own strong loyal
following. And even though there are some differences in how search engines
compile and sort through data (their algorithms), they all provide close to
the same general search functions and generally provide close to the same results.
But the results are not exactly the same. One reason for this is that they work
the same way, but on a different schedule.

A user goes to a search engine because he or she expects it to know where
everything on the Internet is so that it can point the searcher in the right
direction. But how do the search engines find the information in the first place?
Well, all major search portals rely on two primary methods for finding data:
humans and spiders. People and businesses can submit their websites to the major
search engines, which will then scan the site’s content, list it in appropriate
directories and make it available in search results.

But the most common method by which search engines get their information is
by using “spiders.” Spiders basically consist of a little program
that goes out and visits all of the websites it knows, and then goes to the
links on those websites, and the links on those websites, almost ad infinitum.
Along the way, the spiders document the content of the websites, the prominence
of certain words and phrases, various information contained in the background
code of the website (meta tags, etc.), the source of the website and other information.
This data is compiled into an index, which weights the likelihood of a page
meeting a query when performing a search.

Search Engine Optimization Techniques
Many search engines used to rely upon the frequency of a word or phrase on a
particular website as one of the key factors. So if you searched for Oklahoma
Tax Law, a site that uses that phrase 200 times would probably come up first
(or close to it) in a search query. But that method was quickly found to be
flawed because a software company or other vendor could simply include those
words hundreds of times in neutral colored text at the bottom of a web page.
Some people still try this, but the good search engines no longer weight this
factor as strongly, instead also giving credit to the domains that are generally
more trustworthy (such as dot-gov websites that are run by a state or U.S. federal
agency). The dot-gov suffix is the only domain with real content restrictions,
so all of the other domain suffixes are pretty much up for grabs to anybody
for any purpose. Never assume that a dot-info, dot-net or even a dot-org website
is trustworthy based on that domain.

Linking
is also emerging as one of the key factors, in that websites about a certain
topic that are cross linked and referenced to and by many other websites about
similar topics, are more likely to be legitimate and have high-value content.
While the major search engines do not make all of their criteria public (so
that the bad guys won’t abuse it), website developers are generally able
to maximize their appearance in search results by using a combination of all
of these techniques. This carries positives and negatives, of course: With legitimate
web developers doing this, the search results you want are more likely to be
found more quickly, except that the bad guys are also doing this, which continues
to fill your search results with hundreds or even thousands of unrelated results.

Sponsored Listings
It didn’t take long for the major search engines to discover the gold
mine upon which they were sitting … and vendors soon came courting them
with the hope of buying placement in search engine listings. At first, even
the now trustworthy search portals made it difficult for users to differentiate
between legitimate search results and those paid placement listings. Users voiced
their dismay, and soon the more reputable search engines started to display
these sponsored results more distinctly.

The Secret to Quick & Effective Online Searching
It helps to know how search engines list and categorize information when performing
searches, but the best way to limit your search results to a manageable list
of items that are truly related to your search is to know how to search. Regardless
of the search engine you use, simply typing keywords into the search field will
usually be fruitless (or perhaps overly bountiful), resulting in far too many
hits of ambiguous quality. In my next column, I will outline some useful search
techniques that will help you filter these mountainous search results into the
data that you’re seeking, whether it’s information on corporate
tax credits in North Carolina, or if you need to know what football team led
the NCAA in total defense in 2003? (By the way, the answer is the Oklahoma Sooners,
allowing only 255.6 yards of offense per game).

Technology