People Search Project - 2007


About This Project

The People Search Project 2007 is a continuation of an earlier project created by Ketan Mane in 2004. The original project goal was to acquire the possible addresses of the Ph.D. students who graduated from School of Library and Information Science (SLIS), Indiana University, Bloomington. This new version of the project takes up the same goal but with a slight change in the project specifications. First, we began with the assumption that completely automated searching does not produce the best results (at least with current methodologies); therefore, a mechanism to facilitate human interaction with the search results to allow further evaluation needed to be added. Second, additional, more advanced search algorithms needed to be developed, with the hopes a greater number of results with a higher degree of relevance would be obtained.


Methodology

The methodology for this project was multi-phased in that each of the various stages of this project had, for the most part, its own specific methodology. Each of the component parts will be discussed below.

  1. Retrieve Data from the Web
  2. Process the raw data into our database
  3. Using a TF-IDF algorithm, the records in our database are searched for the most relevant documents
  4. Phone numbers are extracted from the most relevant web pages retrieved for each individual. Data retrieved from White Pages for the same individual is then searched to see if any of the phone numbers match (indicating that the web page and address probably refer to the same individual). View Source Code.
  5. An interface was created to allow the viewing, manipulating, and importing of records into our database.

Input

Output

Databases Used


Suggestions for Further Research

There are a number of directions in which this project might be taken in the future. Some key suggestions might be:


Authors:
Daniel B. Bicknell, MIS Student
Jenny Jackson, MIS Student

Last Updated: May 3rd, 2007