Search Tool Data Analysis

by Hernando Flowers (HcflowHcflow in BIT330, Fall 2008)

Questions and queries

Web search engines

I wanted to find out the army recruitment numbers for the years 2007 and 2008. Meaning how many people entered the United States Army.

Army Recruitment Numbers. This was used for all of the web search engines

Blog search engines

I wanted to find out what people said about the benefits that people in the military received.

Military Benefits. This was used for all of the blog search engines.

Data that I collected

Search engine overlap data

Web search Live Google Yahoo Web
Live 30 10 10
Google 45 15
Yahoo Web 45
All 10
Blog search Technorati Google Blog Bloglines
Technorati 10 0 25
Google Blog 25 5
Bloglines 25
All 0

Search engine ranking overlap data

This table provides a measure of how much of Google's responses are reproduced by Yahoo.
GY Yahoo
Google 5 10 20
5 1 1 1
10 0 1 1
20 0 1 1
This table provides a measure of how much of Yahoo's responses are reproduced by Google.
YG Google
Yahoo 5 10 20
5 1 1 1
10 0 1 2
20 0 0 0
This table provides a measure of how much of Blogline's responses are reproduced by Google Blog Search.
BG Google
Bloglines 5 10 20
5 1 1 1
10 0 0 0
20 0 0 0
This table provides a measure of how much of Google Blog Search's responses are reproduced by Bloglines.
GB Bloglines
GBlog 5 10 20
5 1 1 1
10 0 0 0
20 0 0 0

Results

Web search

In the first set of data collected, it showed how precise the sites returned from each query is close to what the user was in search of. Since this portion varies from person to person, there is a wide range of numbers within the precision portion. Sometimes the same sites returned in one search engine will be shown in another one and that's where the overlap portion comes into play. The range of overlap awas between 0% and 45%.

The data for the web and blog search precision can be found Here

The class went deeper into the overlap analysis by determining how close of an overlap was there. Meaning, did the overlap occur in the first 5, 10, or 20 entries. With each ranking overlap the range was typically between 1 and 5. Due to the variance it's hard to find a correlation.

The data for the web and blog search overlap can be found Here

Blog search

We collected the same type of data we did with th web search engines dealing with precision and overlap. As stated before precison is individually dependant so there is a wide range of percentages. As with the web search engines, the same sites returned in one blog engine can be found in another one and from there we calculated the overlap portion. The range of overlap awas between 0% and 25%.

We continued our overlap analysis with the blog search as well. The range was between 1 and 4, but in general there mostly between 0 and 2s across the board.

Discussion

Web search

Dependant upon a person's search query, engine used, and their idea of what they're looking for the precision of that engine can vary. Also since a lot of sites are accessible by these engines some sites may show up in one or more. My recommendation to increase a site's precision is to make your query more specific with the use of specific terms or phrases, i.e. Military Recruitment inurl:.mil. I learned by following my own recommendation I can find a lot of information easier and quicker. This allows for less wasted time searching through the pages looking for the page that best suits my needs. I would wonder what people with the same question and/or same query would rank the precision.

Blog search

The precision and overlap of the blog depends on how specific the person's query is in relation to what they're looking for.
As with a web search engine, a query needs to be as specific as possible to get the maximum efficiency out of it.
I learned, as with the web search engine sata, that everything is individual dependant, there's no real correlation except that the range is small. Other than that the data is variable. I would like to know if two people with the same query and/or question would get the same data as far as the tables are concerned.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License