Intrusion Detection System Based on Data Mining Techniques

Network security is one of the most important non-functional requirements in a system. Over the years, many software solutions have been developed to enhance network security. Intrusion Detection System (IDS) we have provided an overview of different types of intrusion Detection Systems, the advantages and disadvantages of the same. The need for IDS in a system environment and the generic blocks in IDS is also mentioned.The examples are as follows: (1) Misuse intrusion detection system that uses state transition analysis approach, (2) Anomaly based system that uses payload modeling and (3) Hybrid model that combines the best practices of Misuse and Anomaly based intrusion systems.


INTRODUCTION
An intrusion detection system (IDS) is a software tool that monitors network and/or system activities for malicious activities or policy violations and produces reports to a Management Station.Some systems may attempt to stop an intrusion attempt but this is neither required nor expected of a monitoring system.Intrusion detection is primarily focused on identifying possible incidents, logging information about them, and reporting attempts.In addition, organizations use IDS for other purposes, such as identifying problems with security policies, documenting existing threats, and deterring individuals from violating security policies.IDS (Intrusion Detection system) have become a necessary addition to the security infrastructure of nearly every organization.

Classification of IDS
Intrusion detection system can be broadly classified based on two parameters as: Analysis method used to identify intrusion, which is classified into Misuse IDS and Anomaly IDS.Source of data that is another method, which is classified into Host based IDS and Network based IDS.

Misuse IDS
Misuse based IDS is a very prominent system and is widely used in industries.Most of the organizations that develop anti-virus solutions base their design methodology on Misuse IDS.The system is constructed based on the signature of all-known attacks.Rules and signatures define abnormal and unsafe behavior.It analyzes the traffic flow over a network and

System
matches against known signatures.Once a known signature is encountered the IDS triggers an alarm.With the advancement in latest technologies, the number of signatures also increases.This demands for constant upgrade and modification of new attack signatures from the vendors and paying more to vendors for their support.s

Anomaly IDS
Anomaly IDS is built by studying the behavior of the system over a period of time in order to construct activity profiles that represent normal use of the system.The anomaly IDS computes the similarity of the traffic in the system with the profiles to detect intrusions.The biggest advantage of this model is that new attacks can be identified by the system as it will be a deviation from normal behavior.

Host Based IDS
When the source of data for IDS comes from a single host (System), then it is classified as Host based IDS.They are generally used to monitor user activity and useful to track intrusions caused when an authorized user tries to access confidential information.

Network Based IDS
The source of data for these types of IDS is obtained by listening to all nodes in a network.Attacks from illegitimate user can be identified using a network based IDS.Commercial IDSs are always a combination of the two types mentioned above.

Application
Applications of intrusion detection by data mining are as follows: The goal of intrusion detection is to detect • security violations in information systems.Intrusion detection is a passive approach to security as it monitors information systems and raises alarms when security violations are detected.Risk Assessment and Fraud area also uses • the data-mining concept for identifying the inappropriate or unusual behavior etc. Customer Retention in the form of identification • of patterns of defection and prediction of likely defections is possible through data mining.
Background and Related Works R. Heady et. al, (1990) in their study they gained understanding of computer attacks in order to identify intrusion and security threats.

METHODS AND MATERIALS Simulation Tools Hardware Requirements
The minimum hardware required to execute the complete application are mentioned.According to their utility Hardware plays a vital role for the application.It provides the entire interface required and if we increase the configuration of the end system in which application is loaded the definitely the application is executed very fast and responds to user very quickly.

Software Requirements
The software applies all Software Engineering Concepts.Software is information transformed producing, managing, acquiring, modifying, displaying or transmitting information that can be as simple bit or a complex multimedia presentation.

JAvA
The test was carried out using the JAVA program.JAVA-Java is a set of computer software and specifications developed by Sun Microsystems, which was later acquired by the Oracle Corporation that provides a system for developing application software and deploying it in a cross-platform computing environment.Java is used in a wide variety of computing platforms from embedded devices and mobile phones to enterprise servers and supercomputers.While they are less common than standalone Java applications, Java applets run in secure, sand boxed environments to provide many features of native applications.

Netbeans IDE
NetBeans IDE lets you quickly and easily develop Java desktop, mobile, and web applications, as well as HTML5 applications with HTML, JavaScript, and CSS.The IDE also provides a great set of tools for PHP and C/C++ developers.It is free and open source and has a large community

Wireshark (A network analyzer tool)
Wireshark is a network packet analyzer.A network packet analyzer will try to capture network packets and tries to display that packet data as detailed as possible.You could think of a network packet analyzer as a measuring device used to examine what's going on inside a network cable, just like a voltmeter is used by an electrician to examine what's going on inside an electric cable (but at a higher level, of course).In the past, such tools were either very expensive, proprietary, or both.However, with the advent of Wireshark, all that has changed.

Input Data
The first step is data input which can be done by fetching data through Wireshark tool by clicking the capture button in a live network.Wireshark is perhaps one of the best open source packet analyzers available today though it acts as a sniffer.The standard packet capture format is .pcapwhich is fetched by saving the tcpdump files.The way the packets are stored is illustrated by the below figures so that it can easily be understood.

Advantages of WIRESHARK
The Wireshark capture engine provides the following features: Capture from different kinds of network • hardware such as Ethernet or 802.11.The captured packet then is saved to a .pcapformat in order to conversion to .csv(commaseparated value) for further classification.The packets which are saved in .pcapformat is then converted into comma separated values so that it can be easily viewed in word document/notepad file for its feature selection.

RESULTS
The results of the study entitled "Intrusion Detection through data mining technique" was carried out during were discussed in this chapter.The findings have been illustrated with the help of tables, graphs and pictures and were perceived essential to clarify the results.Initially the attribute relation file converted to two types of classification firstly the J48 pruned tree and then naïve bayes classifier.
The graph denotes the probability of intrusion to be occurring in the class.The lower the value of class less will be intrusion probability and if it's higher, probability of intrusion will be higher.Thus the mean intrusion obtained by calculating is differing from the earlier researches done through different kernel functions.
By analysis, the Naïve Bayes obtained better results than previously RBF Kernel and Polynomial functions were used for intrusion detection.Thus machine learning results obtained from Naïve bayes and cross validation technique gives better accuracy.The results show that applied pre-processing data and relevant feature selection using information gain for reducing feature of dataset are very important to increase the classification accuracy.

CONCLUSION
The proposed method was triumphantly tested on the data log files and the database.The results of the proposed testimony are produce more accurate and irrelevant sets of patterns and the discovery time is less than other approach.As a naïve Bayesian network is a restricted network that has only two layers and assumes complete independence between the information nodes.This poses a limitation to this research work.In order to alleviate this problem so as to reduce the false positives, active platform or event based classification may be thought of using Bayesian network.