Answering General Time-Sensitive Queries


Answering General Time-Sensitive Queries

Abstract:
                     Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for “recency” queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.

Existing System:
                   Beyond asking for explicit user input, earlier work by Li and Croft focused on handling recency queries, which are queries that are after recent events or breaking news. Li and Croft’s time sensitive approach processes a recency query by computing traditional topic similarity scores for each document, and then “boosts” the scores of the most recent documents, to privilege recent articles over older ones. In contrast to traditional models, which assume a uniform prior probability of relevance p(d) for each document d in a collection, Li and Croft define the prior p(d) to be a function of document d’s creation date. The prior probability p(d) decreases exponentially with time, and hence recent documents are ranked higher than older documents. Li and Croft’s strategy is designed for queries that are after recent documents, but it does not handle other types of time-sensitive queries, such as [Madrid bombing], [Google IPO], or even [Sarkozy French elections] (in May 2008), that implicitly target one or more past time periods.

Disadvantages:
·         This direct dependency on the relevance scores for estimating the p(t|q) values is somewhat problematic, because these scores were designed for a different purpose, namely, document ranking.
·         the current technique is not conducive to exploring different “shapes” of the p(t|q) probability distribution.
              

Proposed System:
                            We propose a more general framework for answering time-sensitive queries that builds on and substantially expands the earlier work on recency queries. If the relevant time period for a time-sensitive query is unspecified, several query processing approaches are possible. One alternative is to automatically suggest, based on the query terms, relevant time ranges for the query and allow users to explicitly select appropriate time intervals. As an alternative that demands less input from the users, and which we follow in this paper, we can automate the previous procedure and prioritize results from periods that we automatically identify as relevant. We can then naturally define the relevance of a document as a combination of topic similarity and time relevance.

Advantages:
·         We propose a more general framework for handling time-sensitive queries and we automatically identify
·         the important time intervals that are likely to be of interest for a query.
·         We show that our techniques are robust  and significantly improve result quality for time-sensitive queries
·         compared to state-of-the-art retrieval techniques
·         The designed  framework to estimate p(t|q) that addresses these issues, so that it is less dependent on the underlying retrieval model
Architecture:

Architecture Diagram



Software Requirements Specification:

Software Requirements:
Front End                         :      java, Jsp , Servlet
Back End                          :      Oracle 10g
IDE                                    :     my eclipse 8.0
Language                          :      java (jdk1.6.0)
Operating System             :     windows XP

Hardware Requirements:
System                        :   Pentium IV 2.4 GHz.
Hard Disk       :   80 GB.
Floppy Drive   :   1.44 Mb.
Monitor           :   14’ Colour Monitor.
Mouse             :   Optical Mouse.
Ram                 :   512 Mb.
Keyboard        :   101 Keyboards.


Module Description:
1.     Search Over Blogs
2.     Time interval feedback
3.     Temporal relevance feedback (Time Sensitive results
4.     Overall ranking document identification
Search over blogs:
                     A large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on retrieving topically similar documents for a query. Unfortunately, ignoring or not fully exploiting the time dimension can be detrimental for a large family of queries for which we should consider not only the document topical relevance.
Time Interval Feedback:
                       Time-sensitive query over a news archive, our approach automatically identifies important time intervals for  the query. These intervals are then used to adjust the document relevance scores by boosting the scores of documents published within the important intervals. We have implemented our system on top of Indri,2 a state-of-the-art search engine that combines language models and inference networks for retrieval, as well as over Lemur3, into its implementation. Our system provides a web interface for searching the News blaster archive4, an operational news archive and summarization system, and for experimenting with variations of our approach.
Temporal Relevance Feedback:
We discuss several techniques to estimate the temporal relevance of a day to a query at hand. These estimation techniques use the temporal distribution of matching articles for the query to compute the probability that a day in the archive has a relevant document for the query.

Overall ranking document identification:
                   We integrate temporal relevance with state-of-the- art retrieval models, including a query likelihood model, a relevance model, a probabilistic relevance model, and a query expansion with pseudo relevance feedback model, to naturally process time-sensitive queries. In these models, we combine topical relevance and temporal relevance to determine the overall relevance of a document.

Algorithm
Algorithm 1: General time-based approach for estimating the value p(q|t) of each time t and a query q.


                                                                                                                                                                            

6 comments:

  1. plz reply to this topic more details..
    "Answering General Time-Sensitive Queries".
    which implemention tool needed?

    ReplyDelete
  2. I am not sure, even my project is on this topic.

    ReplyDelete
  3. i want to know further details of this project.about architecture diagram and also modules.

    ReplyDelete
  4. I am just starting to work on it, You still need it?

    ReplyDelete
    Replies
    1. can u plz please help me with this project, i am also doing same project

      Delete
  5. can u plz please help me with this project, i am also doing same project

    ReplyDelete