IRM: Integrated File Replication and Consistency Maintenance in P2P Systems
IRM: Integrated File Replication and
Consistency Maintenance in P2P Systems
Abstract:
In
peer-to-peer file sharing systems, file replication and consistency maintenance
are widely used techniques for high system performance. Despite significant
interdependencies between them, these two issues are typically addressed
separately. Most file replication methods rigidly specify replica nodes,
leading to low replica utilization, unnecessary replicas and hence extra consistency
maintenance overhead. Most consistency maintenance methods propagate update
messages based on message spreading or a structure without considering file
replication dynamism, leading to inefficient file update and hence high
possibility of outdated file response. This paper presents an Integrated file
Replication and consistency Maintenance mechanism (IRM) that integrates the two
techniques in a systematic and harmonized manner. It achieves high efficiency
in file replication and consistency maintenance at a significantly low cost.
Instead of passively accepting replicas and updates, each node determines file
replication and update polling by dynamically adapting to time-varying file
query and update rates, which avoids unnecessary file replications and updates.
Simulation results demonstrate the effectiveness of IRM in comparison with
other approaches. It dramatically reduces overhead and yields significant
improvements on the efficiency of both file replication and consistency
maintenance approaches.
Existing System:
OVER
the past years, the immerse popularity of Internet has produced a significant
stimulus to peer-to-peer (P2P) file sharing systems. A recent large-scale
characterization of HTTP traffic has shown that more than 75 percent of Internet
traffic is generated by P2P applications. The percentage of P2P traffic has
increased significantly as files such as videos and audios have become almost pervasive.
The study also shows that the access to these files is highly repetitive and
skewed
towards the most popular ones. Such objects can exhaust the capacity of a node,
leading to delayed response. File replication is an effective method to deal
with the problem of overload condition due to flash crowds or hot files. It
distributes load over replica nodes and improves file query efficiency. File
consistency maintenance to maintain the consistency between a file and its
replicas is indispensable to file replication. Requiring that the replica nodes
be reliably informed of all updates could be prohibitively costly in a large
system. Thus, file replication should proactively reduce unnecessary replicas
to minimize the overhead of consistency maintenance, which in turn provides
guarantee for the fidelity of consistency among file replicas considering file
replication dynamism. File replication dynamism represents the condition with
frequent replica node generation, deletion, and failures. Fig. 1 demonstrates
the interrelationship between file replication and consistency maintenance.
Disadvantages:-
1)
traditional file replication and consistency maintenance methods either are not
sufficiently
effective or incur prohibitively high
overhead.
2) We find that IRM relying on polling file owners
still cannot guarantee that all file requesters
receive up-to-date files, although its
performance is better than other consistency maintenance
algorithms
Proposed System:
This
paper presents an Integrated file Replication and consistency Maintenance
mechanism (IRM) that achieves high efficiency in file replication and
consistency maintenance at a significantly lower cost. IRM integrates file replication
and consistency maintenance in a harmonized and coordinated manner. Basically,
each node actively decides to create or delete a replica and to poll for update
based on file query and update rates in a totally decentralized and autonomous
manner. It replicates highly queried files and polls at a high frequency for
frequently updated and queried files. IRM avoids unnecessary file replications
and updates by dynamically adapting to time varying file query and update
rates. It improves replica utilization, file query efficiency, and consistency
fidelity. A significant feature of IRM is that it achieves an optimized
trade-off between overhead and query efficiency as well as consistency
guarantees. IRM is ideal for P2P systems due to a number of reasons. First, IRM
does not require a file owner to keep track of replica nodes. Therefore, it is
resilient to node joins and leaves, and thus suitable for highly dynamic P2P
systems. Second, since each node determines its need for a file replication or
replica update autonomously, the decisions can be made based on its actual
query rate, eliminating unnecessary replications and validations. This
coincides in spirit with the nature of node autonomy of P2P systems. Third, IRM
enhances the guarantee of file consistency. It offers the flexibility to use
different replica update rate to cater to different consistency requirements
determined by the nature of files and user needs. Faster update rate leads to
higher consistency guarantee, and vice versa. Fourth, IRM ensures high
possibility of up-to-date file responses.
Advantages:-
1) an
Integrated file Replication and consistency Maintenance mechanism (IRM) that
integrates the two
techniques in a systematic and harmonized manner.
2) It dramatically reduces overhead and yields
significant improvements on the
efficiency of both file replication and
consistency maintenance approaches.
Software
Requirements:
The major
software requirements of the project are as follows.
Language : Java (JDK1.6)
Operating System : Microsoft Windows Xp Service Pack 2
IDE : Eclipse IDE 8.0&
Front End : JSP & Servlets
Services
: SOA arechitecture
Hardware Requirements:
Processor : Intel Pentium 4
RAM : 256 MB
Hard Disk : 40
GB
Module
Description:
1.
Adaptive
File replication
i.
Replicate
nodes determination
ii.
Replica
creation
2.
File
Consistency Maintenance
i.
Polling
frequency reduction
ii.
Poll
reduction
3.
Impact
file replication on consistency maintenance
Adaptive File replication:
The
replication algorithm achieves an optimized trade-off between query efficiency
and overhead in file replication. In addition, it dynamically tunes to
time-varying file popularity and node interest, and adaptively determines
replica nodes based on query traffic. In the following, we introduce IRM’s file
replication component by addressing two main problems in file replication:
1)
Where to replicate files so that the file query can be significantly expedited
and the replicas can be fully utilized?
2)
How to remove underutilized file replicas so that the overhead for consistency
maintenance is minimized?
Replicate nodes Determination:
Frequent
requesters of a file and traffic junction nodes (i.e., hot routing spots) in
query paths should be the ideal file replica nodes for high utilization of file
replicas. Based on this, IRM replicates a file in nodes that have been very
interested in the file or routing nodes that have been carrying more query
traffic of the file. The former arrangement enables frequent requesters of a
file to get the file without query routing, and the latter increases the
possibility that queries from different directions encounter the replica nodes,
thus making full use of file replicas. In addition, replicating file in the
middle rather than in the ends of a query path speeds up file query.
Replica Creation:
The
product of a constant factor and the normal query passing rate in the system.
In IRM, when a routing node a receives query for file f, it checks lf . In the
case that lf > Tl and the node has available capacity for a file replica, it
adds a file replication request into the original file request with its IP
address. After the file destination receives the query, if it is overloaded, it
checks if the file query has additional file replication requests. If so, it
sends the file to the replication requesters in addition to the query
initiator. Otherwise, it replicates file f to its neighbors that forward the
queries of file f most frequently.
File Consistency Maintenance:
The
dynamism has posed a challenge for timely update in structured-based
consistency maintenance methods. On the other hand, consistency maintenance
relying on message spreading generate high overhead due to dramatically
redundant messages. Rather than relying on a structure or message spreading,
IRM employs adaptive polling for file consistency maintenance to cater to file
replication dynamism. A poll approach puts the burden of consistency
maintenance on individual nodes. Unlike push, poll approach can achieve good
consistency for distant nodes and is less sensitive to P2P dynamism, network
size, and the connectivity of a node.
Poll Reduction:
The
file change rate, file query rate is also a main factor to consider in
consistency maintenance. Even when a file changes frequently, if a replica node
does not receive queries for the file or hardly queries for the file during a
time period, it is an overhead waste to poll the file’s owner for validation
during the time period. However, most current consistency maintenance methods
neglect the important role that file query rate plays in reducing overhead.
Polling frequency reduction:
In
this case, a file replica node can ensure that a replica is never outdated by
more than 4t seconds by polling the owner every 4t seconds. Since the rate of
file change varies over time as hot files become cold and vice versa, a replica
node should be able to adapt its polling frequency in response to the
variations. In IRM, a replica node intelligently tailors its polling frequency
so that it polls at approximately the same frequency of file change.
Impacts file replication on
consistency maintenance:
IRM
minimizes the number of replicas while maintaining high efficiency and
effectiveness of file replication. First, without arranging the file server to
keep track of the query rate of nodes in a centralized manner, IRM enables each
node to autonomously keep track of its own load status. Thus, the file server
won’t be overloaded easily, leading to less replicas. Second, IRM replicates
files in nodes with high query passing rate or query initiating rate. This
guarantees that a request has high probability to encounter a replica node and
every replica is highly utilized.
0 comments:
Post a Comment