Know Your Search Engine: Part I To Find The Right Results, Ask The Right Questions

From the Oct. 2007 Issue

As a point of information to those whose eyes have been closed to Internet technology over the past few years, “pushing” information is losing ground to “pulling” information. In short, companies used to deploy e-mail newsletters and traditional websites that pushed information out to as many people as they possibly could in hopes of it reaching their intended audiences. Or people would simply visit their list of favorite websites every day to see if something had changed or had been updated.

Information pushing still goes on too frequently, of course — SPAM and other unsolicited commercial e-mail (UCE) will probably never stop. But savvy Internet marketers and information providers have been moving toward technologies like RSS that allow users to pull updated content from websites only on the subjects they want. If any of you need more information on RSS and newsfeed syndication, ask your local 10 year-old. Or you can go online and search for the answer because as much as these feeds have helped us, they cannot replace the bread and butter of the Internet, which is the resolution of an immediate question or query by searching.

Click for full imageSoon after the first commercial websites started appearing in 1994, two smart guys from Stanford realized that a directory of sorts would be beneficial, so they started Yahoo!, which went public in 1996, two-and-a-half years before Google would be incorporated. Many other search engines and portals soon appeared on the Internet, including MSN, HotBot, Excite and AltaVista. AOL, of course, tried launching its own kind of proprietary Internet, which eventually failed, and then tried to rebrand as a search portal, which is failing, too. Currently, Google has about 64 percent of the search market, with Yahoo! at 22 percent, MSN’s Live Search at 7 percent and at about 3 percent. (Information for July 2007 from HitWise, an Experian company.)

How Search Engines Work
Just as with any product, each search engine brand has its own strong loyal following. And even though there are some differences in how search engines compile and sort through data (their algorithms), they all provide close to the same general search functions and generally provide close to the same results. But the results are not exactly the same. One reason for this is that they work the same way, but on a different schedule.

A user goes to a search engine because he or she expects it to know where everything on the Internet is so that it can point the searcher in the right direction. But how do the search engines find the information in the first place? Well, all major search portals rely on two primary methods for finding data: humans and spiders. People and businesses can submit their websites to the major search engines, which will then scan the site’s content, list it in appropriate directories and make it available in search results.

But the most common method by which search engines get their information is by using “spiders.” Spiders basically consist of a little program that goes out and visits all of the websites it knows, and then goes to the links on those websites, and the links on those websites, almost ad infinitum. Along the way, the spiders document the content of the websites, the prominence of certain words and phrases, various information contained in the background code of the website (meta tags, etc.), the source of the website and other information. This data is compiled into an index, which weights the likelihood of a page meeting a query when performing a search.

Search Engine Optimization Techniques
Many search engines used to rely upon the frequency of a word or phrase on a particular website as one of the key factors. So if you searched for Oklahoma Tax Law, a site that uses that phrase 200 times would probably come up first (or close to it) in a search query. But that method was quickly found to be flawed because a software company or other vendor could simply include those words hundreds of times in neutral colored text at the bottom of a web page. Some people still try this, but the good search engines no longer weight this factor as strongly, instead also giving credit to the domains that are generally more trustworthy (such as dot-gov websites that are run by a state or U.S. federal agency). The dot-gov suffix is the only domain with real content restrictions, so all of the other domain suffixes are pretty much up for grabs to anybody for any purpose. Never assume that a dot-info, dot-net or even a dot-org website is trustworthy based on that domain.

Click for full imageLinking is also emerging as one of the key factors, in that websites about a certain topic that are cross linked and referenced to and by many other websites about similar topics, are more likely to be legitimate and have high-value content. While the major search engines do not make all of their criteria public (so that the bad guys won’t abuse it), website developers are generally able to maximize their appearance in search results by using a combination of all of these techniques. This carries positives and negatives, of course: With legitimate web developers doing this, the search results you want are more likely to be found more quickly, except that the bad guys are also doing this, which continues to fill your search results with hundreds or even thousands of unrelated results.

Sponsored Listings
It didn’t take long for the major search engines to discover the gold mine upon which they were sitting … and vendors soon came courting them with the hope of buying placement in search engine listings. At first, even the now trustworthy search portals made it difficult for users to differentiate between legitimate search results and those paid placement listings. Users voiced their dismay, and soon the more reputable search engines started to display these sponsored results more distinctly.

The Secret to Quick & Effective Online Searching
It helps to know how search engines list and categorize information when performing searches, but the best way to limit your search results to a manageable list of items that are truly related to your search is to know how to search. Regardless of the search engine you use, simply typing keywords into the search field will usually be fruitless (or perhaps overly bountiful), resulting in far too many hits of ambiguous quality. In my next column, I will outline some useful search techniques that will help you filter these mountainous search results into the data that you’re seeking, whether it’s information on corporate tax credits in North Carolina, or if you need to know what football team led the NCAA in total defense in 2003? (By the way, the answer is the Oklahoma Sooners, allowing only 255.6 yards of offense per game).