Monday, September 10, 2007

Bhaumik Chokshi's Defense

Bhaumik Chokshi will conduct his thesis defense on Tuesday, 11th Sept in BYENG 455. Details are available below:

Comparing Offline and Online Statistics Estimation for Text Retrieval from Overlapped Collections
Student Defense
Date: September 11, 2007
Time: 10:30 AM - 12:00 PM

Contact Person: Bhaumik Chokshi
Contact Email: bhaumik.chokshi@asu.edu
Location: BYENG 455
Defense Type: Master's Thesis Defense

Committee Members

Dr. Subbarao Kambhampati
Dr. Yi Chen
Dr. Hasan Davulcu

In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and should thus be accessed to answer the query. Collection selection is difficult due to the varying relevance of sources as well as the overlap between these sources. Some of the previous collection selection methods have considered relevance of the collections but have ignored overlap among collections. They thus make the unrealistic assumption that the collections are all effectively disjoint. Overlap estimation can be done in two ways - offline or online. In this thesis, the main objective is to compare these two approaches for estimating statistics. One of the existing approaches(e.g., COSCO) uses offline approach to store the statistics for frequent item sets. It uses these statistics to estimate statistics for the user query. In this thesis, ROSCO is presented, which uses sample based online approach to estimate the overlap among collections for a given query. In addition to that, COSCO and ROSCO are compared with ReDDE(which does not consider overlap) under a variety of scenarios. The experiments show that ROSCO is able to outperform existing methods by 8-10% in presence of overlap among collections.