Network Intrusion Detection Using Correlation Functional Dependency

Nitin D. Shelokar; S.A. Ladhake

PDF Downloads: 882

Open Access - Download full article:

Network Intrusion Detection Using Correlation Functional Dependency

Nitin D. Shelokar¹ and S.A. Ladhake²

¹Sipna College of Engineering, Amravati, (India).

²Department of Computer Science & Technology, Bengal Engineering Science University, Shibpur, Howrah - 711 103 (India).

Article Publishing History
Article Received on :
Article Accepted on :
Article Published :

Article Metrics

ABSTRACT:

We will learn the concept of intrusion detection system in real time. This seminar will give brief idea to have our data to be in secured system i.e free from hackers. We will elaborate the types of intrusion detection system and get into concept of real-time system to detect any intruder coming into the system. An intrusion detection system is used to detect several types of malicious behaviors that can compromise the security and trust of a computer system. We have also discussed the real time intrusion detection system in brief with the efficiency related to instruion detection systems. Data processed by the IDS may be a sequence of commands executed by an user, a sequence of system calls launched by an application (for example a web client), network packets, and so on. Finally, the IDS can trigger some countermeasures to eliminate attack cause/effect whenever an intrusion is detected.

KEYWORDS: Intrusion detection system

Copy the following to cite this article:

Shelokar N. D, Ladhake S. A. Network Intrusion Detection Using Correlation Functional Dependency. Orient. J. Comp. Sci. and Technol;3(1)

Copy the following to cite this URL:

Shelokar N. D, Ladhake S. A. Network Intrusion Detection Using Correlation Functional Dependency. Orient. J. Comp. Sci. and Technol;3(1). Available from: http://www.computerscijournal.org/?p=2248

Introduction

Anderson, in 1980 [1], defined an intrusion attempt or a threat to be the potential possibility of a deliberate unauthorized attempt to access information, or manipulate information, or render a system unreliable or unusable. Since then, several techniques for detecting intrusions have been studied. Despite encrypting, attacks still occur and first layer of defense can still be penetrated. Intrusion Detection Systems (IDS) act as the “second line of defense” and they are placed inside a protected network, looking for known and potential threats in network traffic and/or audit data recorded by hosts¹. In general, there are two approaches for detecting an intrusion in a computer systems and a computer network: misuse detection and anomaly detection. In misuse detection, an intrusion is detected when the behavior of a system matches with any of the intrusion signatures. Meanwhile the anomaly based IDS will detect an intrusion when the behavior of the system deviates from the normal behavior with certain significant tolerance.² IDS can be treated

as pattern recognition problem or rather classified as learning system. Thus, an appropriate representation space for learning by selecting relevant attributes to the problem domain is an important issue for learning systems. According to Bello et al., ³ feature selection is useful to reduce dimensionality of training set; it also improves the speed of data manipulation and improves the classification rate by reducing the influence of noise. Thus selecting important features is an important issue in intrusion detection.⁴ This paper describes an initial work in finding optimal feature subset using Rough Set Theory.

Material and Methods

For this experiment, we have used the KDD CUP1999 data set. In this study, we used the subset that was preprocessed by the Columbia University and distributed as part of the UCI KDArchive (http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html). The original data contained 744 MB data with 4,940,000 records. In our experiment, we only used 8000 records where 70% were used for training and another 30% were used for testing. This 70% comprised of 5600 records and the remaining 30% comprised of 2400 records. The dataset contained normal traffic and all categories of attacks.we are using the following steps in our experimental method :

Intrusion Data Set

In this experiment the KDD CUP 1999 data set [5,6] is used. This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between “bad’’ connections, called intrusions or attacks, and “good” normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

Using correlation

After getting idea from the literature , we start my experiment after collecting dataset from KDD CUP1999 using correlation between attributes of the dataset. Using correlation ,I get 10 different test datasets and 10 different train datasets in which each attribute value shows the correlation with other attributes.

The correlation between two attributes i.e.

A & B is 7:

Vol_3_No_1_net_sun_acm_eq_1

Where A bar, and B bar are mean of A, B respectively, n is the total no. of attributes, A and B the standard deviation of A and B respectively.

Now, using this formula we get the correlation values between attributes in Figure 1.

Using functional dependency

Next , we assume a threshold value i.e., θ= 0.4. Using this threshold value we do experiment on first training Dataset. Here, we get some functional dependency⁸, these are –

a₁ → a₂₂

a₂ → a₈ , a₉

a₆ → a₁₆ , a₂₅

a₈ → a₂ , a₉ , a₁₀

a₉ → a₂ , a₈ , a₁₀

a₁₀ → a₈ , a₉

a₁₅ → a₁₉ , a₂₀

a₁₆ → a₆ , a₂₂ , a₂₆

a₁₇ → a₂₆ , a₂₇

a₁₈ → a₁₉ , a₂₀

a₁₉ → a₁₅, a₁₈, a₂₀

a₂₀ → a₁₅, a₁₈, a₁₉

a₂₁ → a₁₆ , a₂₂

a₂₂ → a₁, a₂₁, a₂₃

a₂₃ → a₂₂, a₂₄

a₂₄ → a₂₃

a₂₅ → a₁₆, a₂₆

a₂₆ → a₁₇, a₂₅, a₂₇

a₂₇ → a₁₇, a₂₆

Closer property

After getting the functional Dependency , we are able to find out the closer property for each attribute. These are mentioned below⁸:

a₁+ = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂ + = { a₂, a₈, a₉, a₁₀ }
a₆ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₈ + = { a₂, a₈, a₉, a₁₀ }
a₉ + = { a₂, a₈, a₉, a₁₀ }
a₁₀ + = { a₂, a₈, a₉, a₁₀ }
a₁₅ + = { a₁₅, a₁₈, a₁₉, a₂₀ }
a₁₆ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₁₇ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₁₈ + = { a₁₅, a₁₈, a₁₉, a₂₀ }
a₁₉ + = { a₁₅, a₁₈, a₁₉, a₂₀ }
a₂₀ + = { a₁₅, a₁₈, a₁₉, a₂₀ }
a₂₁ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₂ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₃ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₄ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₅ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₆ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }
a₂₇ + = { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }

Reduct set

From the above closer set of attributes, we get 3 different reduct sets below [9]:

a) { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }

b) { a₂, a₈, a₉, a₁₀ }

c) { a₁₅, a₁₈, a₁₉, a₂₀ }

Cardinality

Next the cardinality of each reduct set was checked using the formula given below in which, the degree of dependency K ( REDU, D ) between the attributes REDU C and attributes D in decision table T ( C, D ) is¹⁰:

Vol_3_No_1_net_sun_acm_eq_2

Results and Discussion

After doing the experimental work we found 10 correlation-matrix,one of which was mentioned in the above methodology. From the correlation-matrix of first training dataset i.e. Figure 1 we have got the following graph:

In the graph (Figure 2) ,the correlation between attributes is shown in 3-Dimention view.Here,series1 represents atrribute1(i.e. a₁), series2 represents atrribute2(i.e. a₂), series3 represents attribute 3(i.e. a₃) and so on. After getting the correlation values between the 27 attributes of KDD CUP 1999 dataset, we get the 19 different Functional Dependency ‘s using the threshold value è = 0.4. And then using closer property I get 19 different values to predict the classification of these attributes. Thus, 3 different reduct-set of these attributes i.e. { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ }, { a₂, a₈, a₉, a₁₀ },{ a₁₅, a₁₈, a₁₉, a₂₀ }. Then using cardinality formula, the 3 different cardinality values of these reduct sets are generelated that are 0.6226, 0.4716, 0.5849. Among these values 0.6226 of reduct set { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇} is the highest value. This means that the reduct set { a₁, a₆, a₁₆, a₁₇, a₂₁, a₂₂ , a₂₃, a₂₄, a₂₅, a₂₆, a₂₇ } only can represent the characteristics of the total 27 attributes of the dataset. In the KDD CUP dataset there was three different attacks namely- Neptune attack ; Smurf attack; Snmpget attack . Our experimental methods in Support Vector Machine (SVM) ¹¹ shows the result on different attacks which is represented in a Table 1.

Figure 1

Click here to View figure

Figure 2

Click here to View figure

In our experiment, we found the Normal value is – 96.71; Neptune attacks shows 91.32 ; Smurf attack gives – 94.55 and Snmpget attack gives the results of accuracy measurement of dataset- 100. And the mean value of these attacks is – 95.645. Here, we found the Snmpget attack shows much more accurate in compairison to Neptune and Smurf attacks . In other, hand the Neptune attack is showing the least accuracy than Smurf and Snmpget attacks.

Table 1: Accuracy measurement in % of attacks

Click here to View table

Figure 3

Click here to View figure

Table 2: The comparison on 4 different datasets

Click here to View table

References

Debayo O. Adetunmbi , Samuel O. Falaki, Olumide S. Adewale and Boniface K Alese, “Network Intrusion Detection based on Rough Set and k-Nearest Neighbour”, International Journal of Computing and ICT Research, Vol. 2, No. 1, pp. 60 – 66.
Andrew H. S. and Mukkamala S. ,2003, “Identifying important features for intrusion detection using support vector machines and neural networks”,IEEE Proceedings of the Symposium on Application and the Internet (SAINT ‘ 03).
Biswanath M., Todd L.H., and Karl N.L. ,1994, “ Network Intrusion Detection. IEEE Network”, 8(3): page 26-41.
Byunghae-cha K.P. and Jaittyun S.,2005, “Neural Networks Techniques for Host Anomaly Intrusion Detection using Fixed Pattern Transformation”,(ICCSA 2005), LNCS 3481 pp. 254-263.
KDD CUP 1999 dataset : http:// kdd.ics.uci.edu/databases/kddcup99/Hettich S. and Bay S.D. ,1999, The UCI KDD Archive, Available at http://kdd.ics.uci.edu.
Jiawei Han, Micheline Kamber, “ Data Mining: Concepts and Techniques”, Data Mining Books,Publisher: Elsevier Science Ltd. second edition, China Machine Press, pp. 296 -303.
Abraham Silberschatz , Henry Korth , S. Sudarshan ,”Database System Concepts, Database Books”, McGraw-Hill .
Xiaohua Hu, 2001 ,”Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications” , Pages: 233 – 240 , ISBN:0- 7695- 1119-8.
Ihn-Han Bae, Hwa-Ju Lee, and Kyung-Sook Lee , “Design and evaluation of a rough set-based anomaly detection scheme Considering weighted feature values” ; International Journal of Knowledge- Based and Intelligent Engg. System, Vol- 11, Number 4/ 2007, Page- 201-206.
Chang, C., Lin, J., 2003, LIBSVM, “A Library for Support Vector Machines”. http:// www.csie.ntu.edu.tw/~cjlin/libsvm/.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Total Articles Published:	552
Total Downloads:	742729
NAAS Rating 2019:	4.79
Google H-Index:	View