Answering General Time-Sensitive Queries

Home » Unlabelled » Answering General Time-Sensitive Queries

Answering General Time-Sensitive Queries

Posted by hari Saturday, 31 March 2012 6 comments

Answering General Time-Sensitive Queries

Abstract:

Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for “recency” queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.

Existing System:

Beyond asking for explicit user input, earlier work by Li and Croft focused on handling recency queries, which are queries that are after recent events or breaking news. Li and Croft’s time sensitive approach processes a recency query by computing traditional topic similarity scores for each document, and then “boosts” the scores of the most recent documents, to privilege recent articles over older ones. In contrast to traditional models, which assume a uniform prior probability of relevance p(d) for each document d in a collection, Li and Croft define the prior p(d) to be a function of document d’s creation date. The prior probability p(d) decreases exponentially with time, and hence recent documents are ranked higher than older documents. Li and Croft’s strategy is designed for queries that are after recent documents, but it does not handle other types of time-sensitive queries, such as [Madrid bombing], [Google IPO], or even [Sarkozy French elections] (in May 2008), that implicitly target one or more past time periods.

Disadvantages:

· This direct dependency on the relevance scores for estimating the p(t|q) values is somewhat problematic, because these scores were designed for a different purpose, namely, document ranking.

· the current technique is not conducive to exploring different “shapes” of the p(t|q) probability distribution.

Proposed System:

We propose a more general framework for answering time-sensitive queries that builds on and substantially expands the earlier work on recency queries. If the relevant time period for a time-sensitive query is unspecified, several query processing approaches are possible. One alternative is to automatically suggest, based on the query terms, relevant time ranges for the query and allow users to explicitly select appropriate time intervals. As an alternative that demands less input from the users, and which we follow in this paper, we can automate the previous procedure and prioritize results from periods that we automatically identify as relevant. We can then naturally define the relevance of a document as a combination of topic similarity and time relevance.

Advantages:

· We propose a more general framework for handling time-sensitive queries and we automatically identify

· the important time intervals that are likely to be of interest for a query.

· We show that our techniques are robust and significantly improve result quality for time-sensitive queries

· compared to state-of-the-art retrieval techniques

· The designed framework to estimate p(t|q) that addresses these issues, so that it is less dependent on the underlying retrieval model

Architecture:

Architecture Diagram

Software Requirements Specification:

Software Requirements:

Front End : java, Jsp , Servlet

Back End : Oracle 10g

IDE : my eclipse 8.0

Language : java (jdk1.6.0)

Operating System : windows XP

Hardware Requirements:

System : Pentium IV 2.4 GHz.

Hard Disk : 80 GB.

Floppy Drive : 1.44 Mb.

Monitor : 14’ Colour Monitor.

Mouse : Optical Mouse.

Ram : 512 Mb.

Keyboard : 101 Keyboards.

Module Description:

1. Search Over Blogs

2. Time interval feedback

3. Temporal relevance feedback (Time Sensitive results

4. Overall ranking document identification

Search over blogs:

A large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on retrieving topically similar documents for a query. Unfortunately, ignoring or not fully exploiting the time dimension can be detrimental for a large family of queries for which we should consider not only the document topical relevance.

Time Interval Feedback:

Time-sensitive query over a news archive, our approach automatically identifies important time intervals for the query. These intervals are then used to adjust the document relevance scores by boosting the scores of documents published within the important intervals. We have implemented our system on top of Indri,2 a state-of-the-art search engine that combines language models and inference networks for retrieval, as well as over Lemur3, into its implementation. Our system provides a web interface for searching the News blaster archive4, an operational news archive and summarization system, and for experimenting with variations of our approach.

Temporal Relevance Feedback:

We discuss several techniques to estimate the temporal relevance of a day to a query at hand. These estimation techniques use the temporal distribution of matching articles for the query to compute the probability that a day in the archive has a relevant document for the query.

Overall ranking document identification:

We integrate temporal relevance with state-of-the- art retrieval models, including a query likelihood model, a relevance model, a probabilistic relevance model, and a query expansion with pseudo relevance feedback model, to naturally process time-sensitive queries. In these models, we combine topical relevance and temporal relevance to determine the overall relevance of a document.

Algorithm

Algorithm 1: General time-based approach for estimating the value p(q|t) of each time t and a query q.

6 comments:

Unknown16 July 2012 at 19:46
plz reply to this topic more details..
"Answering General Time-Sensitive Queries".
which implemention tool needed?
ReplyDelete
Replies
Anitha18 August 2012 at 09:36
I am not sure, even my project is on this topic.
ReplyDelete
Replies
Unknown22 August 2012 at 19:53
i want to know further details of this project.about architecture diagram and also modules.
ReplyDelete
Replies
Anitha25 September 2012 at 06:42
I am just starting to work on it, You still need it?
ReplyDelete
Replies
ashwin7 February 2013 at 03:39
can u plz please help me with this project, i am also doing same project
ReplyDelete
Replies

Add comment

ACADEMIC PROJECTS

Answering General Time-Sensitive Queries

Related Post:

6 comments:

Arsip Blog

Category

Text Widget

Popular Posts

Sample Text

Unordered List