The 1998 Intrusion Detection Real-time Evaluation Plan Air Force Rome Laboratory (AFRL/SNH-1) Last Modification: 31 March 1998 1.0 Introduction As a part of the DARPA/AFRL sponsored 1998 intrusion detection evaluation, AFRL will conduct a real-time evaluation of selected DARPA-sponsored research projects. The primary goals of this evaluation are to give sponsors and researchers a gauge on how research projects work in a realistic environment and to identify to DoD customers promising emerging technologies for either integration or further development. Promising technologies will be selected for integration in the DISA-sponsored IA: AIDE project (Information Assurance: Automated Intrusion Detection Environment), which is a military-wide demonstration of intrusion detection technology. As a complementary test to the Lincoln Laboratory evaluation, which will rigorously test components of research systems using well-known sensors, the AFRL portion will test the integrated system as part of an active network. Each intrusion detection system (IDS) will be allowed to use its full range of sensors (e.g., active probing or software such as loadable modules) and will be able to initiate a range of responses. In addition, it is hoped that the AFRL test will attempt to measure attributes of an IDS system which are hard to test in a non-real-time evaluation, such as latency of detection and performance under stress. For the 1998 evaluation, only a small number of research IDS's which are deemed ready for testing on an active network will be evaluated. This will aid us by allowing us to gauge the amount of effort involved in integrating and testing each system. We will contact each project chosen to participate individually to arrange the details of their participation. 2.0 Technical Objectives There are two objectives to this evaluation: 1) to measure the effectiveness of an IDS in detecting intrusive behavior in the presence of normal computer and network activity ; and 2) to measure the effectiveness of response mechanisms and their impact on normal users. As in the Lincoln Laboratory portion of the test the following types of misuse will be presented to the IDS under test: 1. Denial of service 2. Unauthorized access from a remote machine 3. Unauthorized transition to root by an unprivileged user 4. Surveillance and probing 5. Anomalous user behavior Many of the attacks used in the Lincoln Laboratory evaluation will be repeated in the AFRL portion of the evaluation. In addition to the types of attacks listed above, the AFRL test will include attacks against network infrastructure, as well as attacks that are only identifiable by hierarchical intrusion detection systems. There will also be attacks against operating systems not included in the 1998 Lincoln evaluation. The same scenario will be run against each IDS system. We do not expect that any system will catch every attack, and the real-time evaluation will not try to derive a figure of merit to rank order systems. In this portion of the evaluation a system will be evaluated for the degree and uniqueness of coverage it provides against the common scenario. 3.0 The Evaluation We will briefly describe the setup and conduct of the test so that test participants may get a feel for the conditions under which their system will be tested. 3.1 Test Setup Each evaluation will be run in a controlled environment. The test network will be isolated from any other network. Control software will schedule and launch both normal activity and intrusions in precisely the same order for every test. Naturally, this won't insure packet for packet exactness on every run, since small variations in such a complex network are bound to produce some variation in the order and timing of events. Since such variations could potentially produce a different number of false alarms from run to run, measurement techniques will be developed that will attempt to eliminate any ordering effects. 3.2 Test Network Topology. Diagram 1 shows the physical layout of the network. The network has multiple sub-domains located behind a firewall. The hexagonal boxes are routers running both OSPF and RIP. The red boxes are external gateways to the test network, while the boxes outside the network generate network traffic and perform network services such as DNS. The majority of the hosts on the network will be on a single sub-lan, represented by the oval. The IDS is expected to protect the region of the network enclosed by the black box and will be allowed to monitor any interface or host contained in this region. There will be a variety of platforms on the test network which can serve as hosts for monitors: 1. Sun Solaris 5.5/SunOs 4.x 2. IBM AIX 2.5 3. HP HP-UX 4. PC NT/Linux There will also be attacks contained in the test for each of these systems. 3.3 Test Network Policy We are planning to have a simple network policy in place with which an IDS can interact. This policy will mirror to the degree practicable the policy reflected in the Lincoln training and test data. This policy won't necessarily be a good policy, and there may be backdoors into the network forbidden by the policy. 3.4 Normal Traffic Normal traffic will be generated by Expect scripts and will be similar to the sessions used in the Lincoln Laboratory portion of the evaluation. A wide range of services will be represented, such as telnet, ftp, smtp, and http, and connections will be initiated and terminated from every region of the network. 3.5 Intrusive Traffic Intrusive behaviors will be pulled from the classes of attacks mentioned in Section 2. These attacks will range from well-known to novel, and many of the attacks will be presented multiple times with varying degrees of stealthiness. Attacks will come from both external and internal sources. Researchers are encouraged to submit list of exploits for which they feel their system would be adept at detecting. An attempt will be made, time and expertise permitting, to include exploits suggested by researchers into the scenario. 4.0 Requirements for Test Participants We want to minimize the burden on the researcher as much as possible; however, some degree of standardization will be required so that we can compare across systems. In addition, integration of these systems is likely to be a formidable challenge given the schedule. By making some demands on the researcher we hope to conduct a consistent and successful evaluation. 4.1 Integration To insure a smooth integration of each system, the researcher should examine the diagrams and provide the following information in advance of the evaluation: 1) All hosts on which software is to be installed, 2) A brief description of installation procedures, 3) Any third party software desired (COTS to be provided by researcher), and 4) Any hardware (provided by researcher) to be installed. At first, a version of each system should be delivered to AFRL which is functional in its external details, so that a dry run installation can be performed. The scheduling of this delivery can be worked out later this spring. During this integration, and/or during the integration of the final system, the researcher is free to visit AFRL to verify that there have been no errors in the installation and that the system is performing as he expects. 4.2 Delivery of system for testing The final system that is to be tested will be delivered to AFRL as identified in the schedule. No further modifications will be allowed after final delivery. 4.3 Results Reporting Unlike the Lincoln Laboratory portion of the test, in which sessions are explicitly enumerated in a file, the system under test will need to identify the connection data associated with each session, as well as a time stamp for the report. This is required so we can attempt to measure latency. Like the Lincoln Laboratory evaluation, there will be one score per session. As a default, if the IDS reports no score for a session, the lowest score reported by the IDS during the evaluation will be assigned. If multiple scores are reported for a session, the highest score reported for the session will be assigned. The format of the output reports should be the following: {TimeStamp for report} {Session Start} {Duration} {Server Port} {Client Port} {Server IP} {Client IP} {Score}