Summary of Labeling for Datasets
- High-Level Labeling
- A textual description of "what happened"
- This is the "story" that corresponds to the attacker actions.
- In these datasets, the textual overview of the attack and descriptions of
the various phases of the attack constitute the high-level labeling.
- Low-Level Labeling
- Segment-out each network session in network data, and each BSM or NT audit record in audit data.
- Provide separate dump file(s) of all packets or records in the attack. In the current instantiation,
low-level bsm labeling is provided in textual "praudit -l" output.
- For both network and audit data labeling, there will be issues
with the problem of defining exactly what is evidence of the attack. In these datasets
we label those packets or audit records that are considered to be a direct part of the
attack.
- Network labeling is based on tcpdump filters, and Audit labeling is based on breaking the
Audit log into "sessions" and labeling those sessions either normal or attack.
- Mid-Level Labeling
- Provides a list of alerts, using Honeywell-API, detail major steps in the
attack XML alerts based on 1999 detection truth files
- These alerts are based directly on the low-level labeling.
- For network traffic this means encapsulating a tcp or udp session or icmp packet into
and XML/IDMEF alert giving the following information: Date, Time, Protocol, Source, Destination, Ports
- For Audit data this means encapsulating each exec record that corresponds
to an attack action or is part of an attack "session". For each the follwing information is
given: Program-name, PID, Time, Arguments, and source host information if executed over a network session.
- NOTE: Both low and mid level labeling are provided to give a good idea of how the attack proceeds
and where to look in the data for evidence of the attack, however this labeling may not
completely meet the needs of all detection techniques. If there is something that we could do to make
this labeling more useful, please let us know! Thanks -- Josh