Answering General Time-Sensitive Queries
Answering
General Time-Sensitive Queries
Abstract:
Time is
an important dimension of relevance for a large number of searches, such as
over blogs and news archives. So far, research on searching over such
collections has largely focused on locating topically similar documents for a
query. Unfortunately, topic similarity alone is not always sufficient for
document ranking. In this paper, we observe that, for an important class of
queries that we call time-sensitive
queries, the publication time of the documents in a news archive is
important and should be considered in conjunction with the topic similarity to
derive the final document ranking. Earlier work has focused on improving
retrieval for “recency” queries that target recent documents. We propose a more
general framework for handling time-sensitive queries and we automatically
identify the important time intervals that are likely to be of interest for a
query. Then, we build scoring techniques that seamlessly integrate the temporal
aspect into the overall ranking mechanism. We present an extensive experimental
evaluation using a variety of news article data sets, including TREC data as
well as real web data analyzed using the Amazon Mechanical Turk. We examine
several techniques for detecting the important time intervals for a query over
a news archive and for incorporating this information in the retrieval process.
We show that our techniques are robust and significantly improve result quality
for time-sensitive queries compared to state-of-the-art retrieval techniques.
Existing
System:
Beyond asking for explicit user
input, earlier work by Li and Croft focused on handling recency queries, which are queries that are after recent events
or breaking news. Li and Croft’s time sensitive approach processes a recency
query by computing traditional topic similarity scores for each document, and
then “boosts” the scores of the most recent documents, to privilege recent
articles over older ones. In contrast to traditional models, which assume a
uniform prior probability of relevance p(d) for each document d in a collection, Li and Croft
define the prior p(d) to be a function of document d’s creation date. The prior
probability p(d) decreases exponentially with
time, and hence recent documents are ranked higher than older documents. Li and
Croft’s strategy is designed for queries that are after recent documents, but
it does not handle other types of time-sensitive queries, such as [Madrid bombing], [Google IPO], or even [Sarkozy
French elections] (in May 2008), that implicitly target one or more past
time periods.
Disadvantages:
·
This direct dependency on the
relevance scores for estimating the p(t|q) values is somewhat problematic, because these scores
were designed for a different purpose, namely, document ranking.
·
the current technique is not conducive
to exploring different “shapes” of the p(t|q) probability distribution.
Proposed
System:
We propose a more
general framework for answering time-sensitive queries that builds on and
substantially expands the earlier work on recency queries. If the relevant time
period for a time-sensitive query is unspecified, several query processing
approaches are possible. One alternative is to automatically suggest, based on
the query terms, relevant time ranges for the query and allow users to
explicitly select appropriate time intervals. As an alternative that demands
less input from the users, and which we follow in this paper, we can automate
the previous procedure and prioritize results from periods that we
automatically identify as relevant. We can then naturally define the relevance
of a document as a combination of topic similarity and time relevance.
Advantages:
·
We propose a more general framework for
handling time-sensitive queries and we automatically identify
·
the important time intervals that are
likely to be of interest for a query.
·
We show that our techniques are robust and significantly improve result quality for
time-sensitive queries
·
compared to state-of-the-art retrieval
techniques
·
The designed framework to estimate p(t|q) that addresses these
issues, so that it is less dependent on the underlying retrieval model
Architecture:
Architecture
Diagram
Software Requirements Specification:
Software Requirements:
Front
End : java, Jsp , Servlet
Back
End : Oracle 10g
IDE :
my eclipse 8.0
Language : java
(jdk1.6.0)
Operating
System : windows XP
Hardware Requirements:
System : Pentium IV 2.4 GHz.
Hard Disk : 80 GB.
Floppy Drive : 1.44
Mb.
Monitor :
14’ Colour Monitor.
Mouse :
Optical Mouse.
Ram : 512
Mb.
Keyboard :
101 Keyboards.
Module
Description:
1. Search Over Blogs
2. Time interval feedback
3. Temporal relevance feedback (Time
Sensitive results
4. Overall ranking document
identification
Search
over blogs:
A large number of
searches, such as over blogs and news archives. So far, research on searching
over such collections has largely focused on retrieving topically similar
documents for a query. Unfortunately, ignoring or not fully exploiting the time
dimension can be detrimental for a large family of queries for which we should
consider not only the document topical relevance.
Time
Interval Feedback:
Time-sensitive query
over a news archive, our approach automatically identifies important time
intervals for the query. These intervals
are then used to adjust the document relevance scores by boosting the scores of
documents published within the important intervals. We have implemented our
system on top of Indri,2 a state-of-the-art search engine that combines
language models and inference networks for retrieval, as well as over Lemur3,
into its implementation. Our system provides a web interface for searching the
News blaster archive4, an operational news archive and summarization system,
and for experimenting with variations of our approach.
Temporal
Relevance Feedback:
We discuss several techniques to estimate the
temporal relevance of a day to a query at hand. These estimation techniques use
the temporal distribution of matching articles for the query to compute the
probability that a day in the archive has a relevant document for the query.
Overall
ranking document identification:
We integrate temporal
relevance with state-of-the- art retrieval models, including a query likelihood
model, a relevance model, a probabilistic relevance model, and a query
expansion with pseudo relevance feedback model, to naturally process
time-sensitive queries. In these models, we combine topical relevance and temporal
relevance to determine the overall relevance of a document.
Algorithm
Algorithm 1:
General time-based approach for estimating the value p(q|t) of each time t and
a query q.
plz reply to this topic more details..
ReplyDelete"Answering General Time-Sensitive Queries".
which implemention tool needed?
I am not sure, even my project is on this topic.
ReplyDeletei want to know further details of this project.about architecture diagram and also modules.
ReplyDeleteI am just starting to work on it, You still need it?
ReplyDeletecan u plz please help me with this project, i am also doing same project
Deletecan u plz please help me with this project, i am also doing same project
ReplyDelete