A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
Abstract:
E-mail
communication is indispensable nowadays, but the e-mail spam problem continues
growing drastically. In recent years, the notion of collaborative spam filtering
with near-duplicate similarity matching scheme has been widely discussed. The
primary idea of the similarity matching scheme for spam detection is to
maintain a known spam database, formed by user feedback, to block subsequent
near-duplicate spams. On purpose of achieving efficient similarity matching and
reducing storage utilization, prior works mainly represent each e-mail by a
succinct abstraction derived from e-mail content text. However, these
abstractions of e-mails cannot fully catch the evolving nature of spams, and
are thus not effective enough in near-duplicate detection. In this paper, we
propose a novel e-mail abstraction scheme, which considers e-mail layout
structure to represent e-mails. We present a procedure to generate thee-mail abstraction
using HTML content in e-mail, and this newly devised abstraction can more
effectively capture the near-duplicate phenomenon of spams. Moreover, we design
a complete spam detection system Cosdes (standing for Collaborative Spam
Detection System), which possesses an efficient near-duplicate matching scheme
and a progressive update scheme. The progressive update scheme enables system
Cosdes to keep the most up-to-date information for near-duplicate detection. We
evaluate Cosdes on a live data set collected from a real e-mail server and show
that our system outperforms the prior approaches in detection results and is
applicable to the real world.
Existing System:
Ø Here,
prior works mainly represent each e-mail by a succinct abstraction derived from
e-mail content text.
Ø These
abstractions of e-mails cannot fully catch the evolving nature of spams and are
thus not effective enough in near-duplicate detection.
.
Proposed System:
Ø We
propose a novel e-mail abstraction scheme, which considers e-mail layout structure
to represent e-mails.
Ø The
progressive update scheme enables system Cosdes to keep the most up-to-date
information for near-duplicate detection.
Ø We
explore a more sophisticated and robust e-mail abstraction scheme, which
considers e-mail layout structure to represent e-mails.
Ø The
specific procedure SAG is proposed to generate the e-mail abstraction using
HTML content in e-mail.
Ø This
newly-devised abstraction can more effectively capture the near-duplicate
phenomenon of spams.
KEYWORDS:
Generic Technology Keywords: Database,
User Interface, Programming
Specific Technology Keywords: C#.Net,
ASP.Net, MS SqlServer-08
Project Keywords: Presentation, Business Object, Data Access Layer, Database
SDLC Keywords: Analysis, Design, Code, Testing, Implementation, Maintenance
SYSTEM
CONFIGURATION
HARDWARE
CONFIGURATION
| 
S.NO | 
HARDWARE | 
CONFIGURATIONS | 
| 
1 | 
Operating System | 
Windows 2000 & XP | 
| 
2 | 
RAM | 
1GB | 
| 
3 | 
Processor (with Speed) | 
Intel 
  Pentium IV (3.0 GHz) and Upwards | 
| 
4 | 
Hard Disk Size | 
40 GB and above | 
| 
5 | 
Monitor | 
15’ CRT | 
SOFTWARE
CONFIGURATION
| 
S.NO | 
SOFTWARE | 
CONFIGURATIONS | 
| 
1 | 
Platform | 
Microsoft Visual Studio | 
| 
2 | 
Framework | 
.Net Framework 4.0 | 
| 
3 | 
Language | 
C#.Net | 
| 
4 | 
Front End | 
Asp.net, html  | 
| 
5 | 
Back End | 
SQL Server 2008 | 
 
I need this project please inform me the cost of the project.....9493600160
ReplyDelete