MIT Lincoln Laboratory: Cyber Security and Information Sciences: Cyber Corpora

Cyber Corpora

The appropriate copyright notices have been included with the data and source below. Please review these carefully.

Cyber Grand Challenge - Starting with over 100 teams consisting of some of the top security researchers and hackers in the world, the Defense Advanced Research Projects Agency (DARPA) pitted the top seven teams against each other in the Cyber Grand Challenge final event. During the competition, each team's Cyber Reasoning System (CRS) automatically identified software flaws and scanned a purpose-built, air-gapped network to identify affected hosts. Archived here are relevant artifacts from the events that made up the complete challenge. These artifacts include tools from infrastructure and hosting to challenge binaries and code.
- Cyber Grand Challenge Data and Tools
- Cyber Grand Challenge Overview
EasyCrypt Proof of the Security of a Private Count Retrieval Cryptographic Protocol - Proof scripts for the EasyCrypt cryptographic proof assistant proving the honest but curious security of the PCR (Private Count Retrieval) cryptographic protocol. The PCR protocol involves three parties: a Server, a Client and a Third Party (TP). The protocol works with one-dimensional databases: a database consists of a list of elements (which can be anything). Database queries are single elements: a query is a request for the number of occurrences of the query in the database. The database is held by the Server, and queries are made by the Client. The Client is only allowed to learn the counts for the queries it makes, whereas the Server must not learn what queries the Client makes. The TP is an intermediary between the Server and Client, but isn't trusted - it is only allowed to learn certain element patterns. The attached EasyCrypt proof scripts prove the honest but curious security of the protocol.
Download: PCR.tgz
Battleship Case Study in Secure Programming - The program code for the case study in secure programming described in the paper You Sank My Battleship! A Case Study in Secure Programming, by Alley Stoughton, Andrew Johnson, Samuel Beller, Karishma Chadha, Dennis Chen, Kenneth Foner and Michael Zhivich, presented at the ACM Ninth Workshop on Programming Languages and Analysis for Security (PLAS 2014), and appearing in the ACM digital library. The case study consists of three implementations of the game Battleship: one in Concurrent ML using a trusted referee, one in Haskell/LIO using information flow control to avoid the need for a trusted referee, and one in Concurrent ML using discretionary access control to avoid needing a trusted referee.

Download: battleship.tgz
Off-line intrusion detection evaluation data, as described for the 1998 Evaluation in Evaluating Intrusion Detection Systems: The 1998 DARPA Off-Line Intrusion Detection Evaluation, by Lippmann, Richard P., David J. Fried, Isaac Graf, Joshua W. Haines, Kristopher R. Kendall, David McClung, Dan Weber, Seth E. Webster, Dan Wyschogrod, Robert K. Cunningham, and Marc A. Zissman, in Proceedings of the 2000 DARPA Information Survivability Conference and Exposition (DISCEX), Vol. 2. 2000, IEEE Computer Society Press: Los Alamitos, CA. p. 12-26.

Alternatively, you could use the data from the 1999 Evaluation as described in Analysis and Results of the 1999 DARPA Off-Line Intrusion Detection Evaluation, by Lippmann, Richard P. and Joshua Haines, and in the Proceedings of Recent Advances in Intrusion Detection, Third International Workshop, RAID 2000 Toulouse, France, Eds. H. Debar, L. Me, and S.F. Wu, p. 162-182, Springer Verlag, 2000.

Model Programs for Evaluating Static Analysis Buffer Overflow Detectors – as described in the paper "Using Exploitable Buffer Overflows From Open Source Code" by Misha Zitser, Richard Lippmann, and Tim Leek, Proceedings ACM Sigsoft 2004/FSE Foundations of Software Engineering Conference, 2004. NOTE: Each of the model programs is derived from an historic open source program, and as such it is subject to the licensing terms for that program.

View: Text Abstract
View: Full Paper
Download model program examples: models-2007-11-06.tgz
Contact: tleek@ll.mit.edu Tim Leek
Diagnostic Test Suite for Evaluating Buffer Overflow Detection Tools – the companion test suite for "Evaluating Static Analysis Tools for Detecting Buffer Overflows in C Code," a Harvard University master’s thesis by Kendra Kratkiewicz. The thesis evaluated five static analysis tools to determine their strengths and weaknesses in detecting a variety of buffer overflow flaws in C code.

View: Text Abstract
View: Full Text
Download diagnostic test suite: BOdiagsuite-20050808.tgz
Download test suite infrastructure: BOdiaginfrastructure-2005-8-8.tgz
Contact BOdiagsuite@ll.mit.edu

A Corpus of Group Dynamics Data from Internet Chatrooms - as described in
the paper "A Corpus of Group Dynamics Data from Internet Chatrooms" by Galen Pickard, Roger Khazan, Benjamin Fuller, and Joseph Cooley. The paper used this corpus to evaluate the performance of a key distribution algorithm.

View: Text Abstract
View: Documentation
Download data from single room (SQlite3 DB, 8.2MB): room1.tar.gz
Download entire data set (SQLite3 DB, 488MB): allrooms.tar.gz
Download scripts (perl, MATLAB, 3.3KB): scripts.tar.gz

top of page

CORPORA

Cyber Corpora