README 1week ------------ Six weeks of training data are being created at MIT Lincoln Laboratory and provided as part of the 1998 DARPA Intrusion Detection Evaluation. The first two weeks should be considered as "alpha" training data collected as the simulation becomes fully instantiated and reaches a steady state. This data should be used for initial training and development of intrusion detection systems. The final four weeks of training data will be similar to test data and should be used for final training and development of intrusion detection systems. These four weeks of data will include more types of background traffic and attack variants than the first two weeks of data. These four weeks of data will also be generated using the same procedures that will be used with the test data. The first week of training data starts with a large amount of background traffic an average of two attacks per day. During the first and second week, services, attacks, and additional traffic types are added until traffic patterns reach a final steady state by the end of the second week. Following the second week, the final four weeks of training data will remain similar to each other with regard to attack types and background traffic. The first two weeks of data are being provided early because this data is similar to the final four weeks of data, and it will give contractors experience handling the data and a head-start in training their intrusion detection systems. The six weeks of data will include more than 100 instances of 20 different attack types. BSM and tcpdump data collection normally begins at 8 AM each weekday and ends 6 AM on the following day, providing 22 hours of data each day. These start and stop times are not exact, but vary from day to day. Data collected on each day are provided in separate directories. Data starting on monday through friday is placed in separate directories named monday, tuesday, wednesday, thursday, friday. Data was collected continuously on a real network which simulated more than 1000 virtual hosts and more than 700 virtual users. Auditing and tcpdump is turned off at 6 AM each morning when all machines are taken down and re-booted. During this down time, a full backup is made for the Solaris host named pascal using ufsdump and preparations are made for the next day's simulation run. Auditing and tcpdump is then restarted at roughly 8 AM each morning. The state of all simulation machines including pascal (the Solaris machine being audited) should not change substantially during down time and no attacks or attack setups on these machines are performed during down time. Pascal dump files created using ufsdump are no longer provided on the Lincoln web site after week one of training because they consume too much disk space and because very few participants in the 1998 evaluation are using this data. These dump files will be provided on CDROM, but only to those participants who explicitly request these CDROM's. Send requests for pascal ufsdump CDROM's to darpa-intrusion-eval-admin@sst.ll.mit.edu. CDROMS containing tcpdump and BSM data are mailed out roughly a week after the data is posted to the Lincoln web site. Please read all the README files in the subdirectories. Files in this directory =========================== README - this file describing the contents of all files attacks.html - describes all attacks for this week and past weeks --------------------------------------------------------------------------- 2weekdoc.tar - a tar file containing descriptive information for this week --------------------------------------------------------------------------- README.formats - describes the format of list files which list all network sessions hosts.memo - lists ip addresses, host names, and operating systems of hosts network.ps - a postscript picture of the simulation network including ip addresses of important hosts bin/ - contains sniffer, dump, and psmonitor shell scripts README.bsm - describes contents of bin directory run_sniffer - shell script used to start sniffing on solomon run_psmonitor - shell script to run ps periodically on pascal and store output in psmonitor log file run_dump - shell script used to create system dump of Solaris from pascal config/ - contains bsm and pascal configuration information README.bsm - describes the bsm configuration files used in this simulation README.pascal - describes how to obtain configuration information for the Solaris workstation named pascal which runs BSM auditing audit_class - pascal BSM configuration file audit_control - pascal BSM configuration file audit_event - pascal BSM configuration file audit_startup - shell script used to make sure BSM auditing on pascal includes important system background processes audit_user - pascal BSM configuration file hosts.equiv - /etc/hosts.equiv file listing trusted hosts for pascal services - /etc/services file for pascal ------------------------------------------------------------------------------- monday.tar tuesday.tar wednesday.tar thursday.tar friday.tar - tar files containing bsm and tcp dump data, attack information, and list files for each day ------------------------------------------------------------------------------- each daily tar file contains the following files: bsm.list.gz - The list file for the bsm data which indicates where all sessions begin and end and labels attacks. The format for this file is described in "README.formats." pascal.bsm.gz - The actual raw binary bsm audit data for this day. Information describing how BSM was configured is provided in the ./config directory. pascal.praudit.gz - The ascii BSM audit data created by running the binary bsm audit data through praudit. pascal.psmonitor.gz - The results of running the UNIX ps command every 60 seconds the Solaris host where auditing was turned on (pascal). The shell script that created this file is provided in ./bin/run_psmonitor tcpdump.gz - The raw tcpdump data from the sniffer in this simulation. The shell script used to start the sniffer is provided in ./bin/run_sniffer. tcpdump.list.gz - The list file for tcpdump data which indicates where all sessions begin and end and labels attacks. The format for this file is described in "README.formats" -------------------------------------------------------------------------------- (last updated Aug 3, 1998)