Summary of Labeling for Datasets

High-Level Labeling
- A textual description of "what happened"
- This is the "story" that corresponds to the attacker actions.
- In these datasets, the textual overview of the attack and descriptions of the various phases of the attack constitute the high-level labeling.
Low-Level Labeling
- Segment-out each network session in network data, and each BSM or NT audit record in audit data.
- Provide separate dump file(s) of all packets or records in the attack. In the current instantiation, low-level bsm labeling is provided in textual "praudit -l" output.
- For both network and audit data labeling, there will be issues with the problem of defining exactly what is evidence of the attack. In these datasets we label those packets or audit records that are considered to be a direct part of the attack.
- Network labeling is based on tcpdump filters, and Audit labeling is based on breaking the Audit log into "sessions" and labeling those sessions either normal or attack.
Mid-Level Labeling
- Provides a list of alerts, using Honeywell-API, detail major steps in the attack XML alerts based on 1999 detection truth files
- These alerts are based directly on the low-level labeling.
- For network traffic this means encapsulating a tcp or udp session or icmp packet into and XML/IDMEF alert giving the following information: Date, Time, Protocol, Source, Destination, Ports
- For Audit data this means encapsulating each exec record that corresponds to an attack action or is part of an attack "session". For each the follwing information is given: Program-name, PID, Time, Arguments, and source host information if executed over a network session.
NOTE: Both low and mid level labeling are provided to give a good idea of how the attack proceeds and where to look in the data for evidence of the attack, however this labeling may not completely meet the needs of all detection techniques. If there is something that we could do to make this labeling more useful, please let us know! Thanks -- Josh