README 1week ------------ Six weeks of training data are being created at MIT Lincoln Laboratory and provided as part of the 1998 DARPA Intrusion Detection Evaluation. The first two weeks should be considered as "alpha" training data collected as the simulation becomes fully instantiated and reaches a steady state. This data should be used for initial training and development of intrusion detection systems. The final four weeks of training data will be similar to test data and should be used for final training and development of intrusion detection systems. These four weeks of data will include more types of background traffic and attack variants that the first two weeks of data. These four weeks of data will also be generated using the same procedures that will be used with the test data. The first week of training data starts with a large amount of background traffic an average of two attacks per day. During the first and second week, services, attacks, and additional traffic types are added until traffic patterns reach a final steady state by the end of the second week. Following the second week, the final four weeks of training data will remain similar to each other with regard to attack types and background traffic. The first two weeks of data are being provided early because this data is similar to the final four weeks of data, and it will give contractors experience handling the data and a head-start in training their intrusion detection systems. The six weeks of data will include more than 100 instances of 20 different attack types. BSM and tcpdump data collection normally begins at 8 AM each weekday and ends and 6 AM on the following day, providing 22 hours of data each day. These start and stop times are not exact, but vary from day to day. Data collected on each day are provided in separate directories. Data starting on monday through friday is placed in separate directories named monday, tuesday, wednesday, thursday, friday. Data was collected continuously on a real network which simulated more than 1000 virtual hosts and 700 virtual users. Auditing and tcpdump is turned off at 6 AM each morning when all machines are taken down and re-booted. Auditing and tcpdump is then restarted at roughly 8 AM each morning. The state of all simulation machines including pascal (the Solaris machine being audited) should not change substantially during down time and no attacks or attack setups on these machines will be performed during down time. Please read all the README files in the subdirectories. Files in this directory -------------------------- README - this file README.formats - describes the format of list files which list all network sessions hosts.memo - lists ip addresses, host names, and operating sytsems of hosts network.ps.gz - a postscript picture of the simulation network including ip addresses of important hosts network.ps.gz - picture of the network including ip addresses and names of major gateways, victim machines, router, and sniffer host.memo - list of inside and outside host names, ip addresses and system types bsm.memo - description of how bsm was run on Solaris target host named pascal bin/ run_sniffer - shell script used to start sniffing on solomon run_psmonitor - shell script to run ps periodically on pascal and store output in psmonitor log file run_dump - shell script used to create system dump of Solaris from pascal CONFIG/ contains the bsm configuration files and starting scripts from the simulation (see the file README.bsm for a description of these files) README this document README.bsm describes the bsm configuration files used in this simulation and how we produced praudit output for processing (there is a bug in praudit) README.formats describes what your intrusion detection system must do and the format of the ".list" files in this directory README.tcpdump describes how the tcpdump data was collected network.ps.gz a gzipped PostScript file showing the topology of the test network used in this simulation data for each day's run monday/ tuesday/ wednesday/ thursday/ friday/ each daily directory contains the following files bsm.list.gz the list file for the bsm data. The format is described in "README.formats" pascal.bsm.gz the actual raw bsm data from this simulation pascal.praudit.gz our praudit results ps-elf.gz the results of running the UNIX command "ps -elf" every 60 seconds on the machine which was audited (see the file CONFIG/bsm/reset for the script that created this file) tcpdump.gz the raw tcpdump data from the sniffer in this simulation tcpdump.list.gz the list file for the tcpdump data as described in "README.formats" attacks.memo a short description of the attacks included in this day's data nightly pascal dump files created with ufsdump home.dump.gz opt.dump.gz root.dump.gz usr.dump.gz (last updated June 26, 1998)