1

I'm using KDD1999 dataset to prevent intrusion, but i have some questions about the features: can someone explain to me or give me the meaning of the flags. Here is the list of the flags used in the KDD1999 dataset:

'flag' { 'OTH', 'REJ', 'RSTO', 'RSTOS0', 'RSTR', 'S0', 'S1', 'S2', 'S3', 'SF', 'SH' }

here is a example of KDD dataset records:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
Qantas 94 Heavy
  • 15,750
  • 31
  • 68
  • 83
Nadya Nux
  • 519
  • 1
  • 5
  • 17

1 Answers1

2

First of all, note that the data set is flawed and should not be used (KDNuggets statement). Roughtly said for two reasons: A) it is not at all realistic, in particular not for modern attacks (heck, not even for real attacks back in 1998!) - todays, most attacks are SQL injection and password theft via trojans, neither of which will be detectable with this kind of data. B) the data set is focused around attacks, so it consists of attacks with some background noise; while actual traffic will be largely data and some attacks and C) it was simulated with a largely virtual network, and you can detect the "attacks" by the simulated network topology only.

Judging from the documentation of the usual preprocessed version, the flags is a derived value of the connection state, i.e. whether the reply to the connection attempt was a TCP REJ, TCP RST etc.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • That is nothing but the TCP port number. The single most used attribute in IP firewalls. But many of these do virtually not exist anymore! Who uses pop2 nowadays? `echo` is ICMP ping. **Use *current* and *real* data instead of that useless data set!** – Has QUIT--Anony-Mousse Jun 24 '16 at 15:32
  • The attribute is called the *port*. echo response is an icmp 'port'. The probably most used value is 80, the defsult port of http... – Has QUIT--Anony-Mousse Jun 29 '16 at 19:46
  • r2l = remote to local, very unspecific. I ha e no idea where you got the 4 from. Let me emphasize again: **it's a useless data set, don't use it anymore**. – Has QUIT--Anony-Mousse Jul 11 '16 at 19:51
  • What about the refined version of this dataset: [NSL-KDD](http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html)? Is it decent, or still kinda useless? – max Aug 07 '16 at 17:53
  • They removed duplicates, but it still has no meaning for real intrusion detection. A, B and C above all still hold. So it still is completely useless. – Has QUIT--Anony-Mousse Aug 07 '16 at 18:43