The 1998 Intrusion Detection
                         Real-time Evaluation Plan

                  Air Force Rome Laboratory (AFRL/SNH-1)

                    Last Modification: 31 March 1998

1.0 Introduction

As a part of the DARPA/AFRL sponsored 1998 intrusion detection
evaluation, AFRL will conduct a real-time evaluation of selected
DARPA-sponsored research projects. The primary goals of this
evaluation are to give sponsors and researchers a gauge on how
research projects work in a realistic environment and to identify to
DoD customers promising emerging technologies for either integration
or further development.  Promising technologies will be selected for
integration in the DISA-sponsored IA: AIDE project (Information
Assurance: Automated Intrusion Detection Environment), which is a
military-wide demonstration of intrusion detection technology.  As a
complementary test to the Lincoln Laboratory evaluation, which will
rigorously test components of research systems using well-known
sensors, the AFRL portion will test the integrated system as part of
an active network.  Each intrusion detection system (IDS) will be
allowed to use its full range of sensors (e.g., active probing or
software such as loadable modules) and will be able to initiate a
range of responses.  In addition, it is hoped that the AFRL test will
attempt to measure attributes of an IDS system which are hard to test
in a non-real-time evaluation, such as latency of detection and
performance under stress.

For the 1998 evaluation, only a small number of research IDS's which
are deemed ready for testing on an active network will be evaluated.
This will aid us by allowing us to gauge the amount of effort involved
in integrating and testing each system.  We will contact each project
chosen to participate individually to arrange the details of their
participation.

2.0 Technical Objectives

There are two objectives to this evaluation: 1) to measure the
effectiveness of an IDS in detecting intrusive behavior in the
presence of normal computer and network activity ; and 2) to measure
the effectiveness of response mechanisms and their impact on normal
users.

As in the Lincoln Laboratory portion of the test the following types
of misuse will be presented to the IDS under test:

1. Denial of service
2. Unauthorized access from a remote machine
3. Unauthorized transition to root by an unprivileged user
4. Surveillance and probing
5. Anomalous user behavior


Many of the attacks used in the Lincoln Laboratory evaluation will be
repeated in the AFRL portion of the evaluation.  In addition to the
types of attacks listed above, the AFRL test will include attacks
against network infrastructure, as well as attacks that are only
identifiable by hierarchical intrusion detection systems.  There will
also be attacks against operating systems not included in the 1998
Lincoln evaluation.


The same scenario will be run against each IDS system.  We do not
expect that any system will catch every attack, and the real-time
evaluation will not try to derive a figure of merit to rank order
systems.  In this portion of the evaluation a system will be evaluated
for the degree and uniqueness of coverage it provides against the
common scenario.

3.0 The Evaluation

We will briefly describe the setup and conduct of the test so that
test participants may get a feel for the conditions under which their
system will be tested.

3.1 Test Setup

Each evaluation will be run in a controlled environment.  The test
network will be isolated from any other network.  Control software
will schedule and launch both normal activity and intrusions in
precisely the same order for every test.  Naturally, this won't insure
packet for packet exactness on every run, since small variations in
such a complex network are bound to produce some variation in the
order and timing of events. Since such variations could potentially
produce a different number of false alarms from run to run,
measurement techniques will be developed that will attempt to
eliminate any ordering effects.

3.2 Test Network Topology. 

Diagram 1 shows the physical layout of the network.  The network has
multiple sub-domains located behind a firewall.  The hexagonal boxes
are routers running both OSPF and RIP.  The red boxes are external
gateways to the test network, while the boxes outside the network
generate network traffic and perform network services such as DNS.
The majority of the hosts on the network will be on a single sub-lan,
represented by the oval.  The IDS is expected to protect the region of
the network enclosed by the black box and will be allowed to monitor
any interface or host contained in this region.  There will be a
variety of platforms on the test network which can serve as hosts for
monitors:

1. Sun 		Solaris 5.5/SunOs 4.x
2. IBM		AIX 2.5
3. HP		HP-UX
4. PC		NT/Linux

There will also be attacks contained in the test for each of these systems.

3.3 Test Network Policy

We are planning to have a simple network policy in place with which an
IDS can interact.  This policy will mirror to the degree practicable
the policy reflected in the Lincoln training and test data.  This
policy won't necessarily be a good policy, and there may be backdoors
into the network forbidden by the policy.

3.4 Normal Traffic

Normal traffic will be generated by Expect scripts and will be similar
to the sessions used in the Lincoln Laboratory portion of the
evaluation.  A wide range of services will be represented, such as
telnet, ftp, smtp, and http, and connections will be initiated and
terminated from every region of the network.

3.5 Intrusive Traffic

Intrusive behaviors will be pulled from the classes of attacks
mentioned in Section 2.  These attacks will range from well-known to
novel, and many of the attacks will be presented multiple times with
varying degrees of stealthiness.  Attacks will come from both external
and internal sources.  Researchers are encouraged to submit list of
exploits for which they feel their system would be adept at detecting.
An attempt will be made, time and expertise permitting, to include
exploits suggested by researchers into the scenario.

4.0 Requirements for Test Participants

We want to minimize the burden on the researcher as much as possible;
however, some degree of standardization will be required so that we
can compare across systems.  In addition, integration of these systems
is likely to be a formidable challenge given the schedule.  By making
some demands on the researcher we hope to conduct a consistent and
successful evaluation.

4.1 Integration

To insure a smooth integration of each system, the researcher should
examine the diagrams and provide the following information in advance
of the evaluation:

1) All hosts on which software is to be installed,
2) A brief description of installation procedures,
3) Any third party software desired (COTS to be provided by researcher), and
4) Any hardware (provided by researcher) to be installed.

At first, a version of each system should be delivered to AFRL which
is functional in its external details, so that a dry run installation
can be performed.  The scheduling of this delivery can be worked out
later this spring.  During this integration, and/or during the
integration of the final system, the researcher is free to visit AFRL
to verify that there have been no errors in the installation and that
the system is performing as he expects.

4.2 Delivery of system for testing

The final system that is to be tested will be delivered to
AFRL as identified in the schedule.  No further modifications will be
allowed after final delivery.

4.3 Results Reporting

Unlike the Lincoln Laboratory portion of the test, in which sessions
are explicitly enumerated in a file, the system under test will need
to identify the connection data associated with each session, as well
as a time stamp for the report.  This is required so we can attempt to
measure latency.  Like the Lincoln Laboratory evaluation, there will
be one score per session.  As a default, if the IDS reports no score
for a session, the lowest score reported by the IDS during the
evaluation will be assigned. If multiple scores are reported for a
session, the highest score reported for the session will be assigned.
The format of the output reports should be the following:

{TimeStamp for report} {Session Start} {Duration} {Server Port} 
    {Client Port} {Server IP} {Client IP} {Score}