Cyber Grand Challenge Corpus

The Cyber Grand Challenge (CGC) corpus contains a collection of 250 purpose build applications designed to challenge vulnerability identification and remediation systems and the results of CGC competition held in 2015 and 2016.

Applications

Each application within the corpus was built with the intent of pushing the state of the art of autonomous vulnerability discovery and remediation.

Each application was implemented as a network service which performs some as such as file transfers, remote procedure calls, or peer-to-peer networking. Applications mirror real world tasks using novel protocols and implementations. Each application contains at least one memory corruption security flaw that can be triggered from network input.

Applications within the corpus were developed by multiple teams. The challenge author teams were given the following statement to guide their development ¹¹:

Superior approaches will demonstrate knowledge of the problems involved in creating challenge software for the purpose of cybersecurity competition (e.g., binaries of excessive difficulty prevent any competitor from making progress, while binaries of limited difficulty prevent meaningful measurement). Strong CB authors will demonstrate knowledge of the current limits of automated cyber reasoning in terms of program complexity and flaw discovery difficulty; this knowledge is essential in order to create a collection of CBs that spans a difficulty range from challenging to beyond state-of-the-art. The task of creating novel hidden software flaws to challenge the leading edge of program analysis poses significant technology risk. CB authors are expected to overcome this risk with a representative corpus of Challenge Sets. Strong CB authors will cover a history of known software flaws that represent interesting analysis challenges, mapped to specific CWE categories that will be represented within the CS portfolio of the author.

The unique properties of the challenges developed for CGC gave two beneficial properties: Analysis of an application within the corpus will not be benefited by pre-analyzing other applications. Vulnerabilities discovered within the corpus have no real world impact.

Each application is developed in either C or C++, and must operate as an INETD ⁷ service. The applications must implement the CGC specific Application Binary Interface (ABI) ¹

Each application could be composed of multiple binaries with an inter-process communication (IPC) mechanism ⁸ that leveraged additional file descriptors provided to each binary.

The application developers were required to provide a detailed description of the application, including the Common Weakness Enumeration (CWE) entry for the author provided vulnerabilities ⁶.

For CFE, application authors were asked to supply difficulty information for challenges they developed. The following information indicates the difficulty level as intended by the author in one of three key areas that a CRS would need to address within CGC:

Discovery of the vulnerabilities
Mitigating the vulnerabilities
Proving the vulnerabilities

The source, compiled binary, application verification, and proofs of vulnerabilities for all applications are provided within the corpus.

Application Verification

The verification of applications comes in two parts, functionality verification and vulnerability demonstration. The coupling of these two areas of verification allow researchers to test the efficacy of vulnerability remediation capabilities while measuring the impact of any remediation to the functionality of the application.

Vulnerability Demonstration

Included within each application are intended memory corruption vulnerabilities. In order to demonstrably verify the vulnerabilities are exercisable without external knowledge, concrete external inputs are provided to prove each author each author specified vulnerability.

Additionally, these proofs of vulnerabilities can be used to assist in testing the efficacy of vulnerability remediation.

Functionality Verification

Functionality verification tests of varying complexity are provided with each application that can be used to test the impact of any changes made to the application. The verification tests are made up of thousands of individually unique interactions that can check complex protocol interactions. These verification tests are primarily generated via a generative state graph walking tool, allowing for arbitrarily complex protocol integrations.

These verification tests can be used to assist in testing the impact of any vulnerability remediation to the functionality of the application.

CGC Qualifying Event (CQE)

On June 3rd, 2015, teams from across the globe vied for a chance to qualify for DARPA's Cyber Grand Challenge's Final Event. Competitors were given 131 unique purpose built challenges containing undisclosed software security vulnerabilities, with the intent of each team autonomously discovering and mitigating as many vulnerabilities as they could within 24 hours.

In the end, the top seven teams were selected as finalists and went on to compete in the Cyber Grand Challenge final event. The submissions for the top 13 teams and details of each challenge that made up the qualifying event were made public shortly after the qualifying event.

CQE Submissions

In CQE, competitors were challenged to provide a submission for each application that included a replacement application with any vulnerabilities remediated coupled with a proof of vulnerability (POV) that could deterministicly prove a vulnerability by causing the application to terminate via SIGSEGV, SIGILL, or SIGBUS signal. POVs were required to follow the CGC XML specification ⁹.

CQE Scoring

Each submission was evaluated in terms of performance, functionality, and security. Unique to performance and functionality measurements was a faster-than-linear drop off, such that a 50% impact to functionality would more than half the score ⁵. A team's total score was the sum of their score for each submission.

Performance was measured based on the worst of the negative impact of file size, execution time, or memory usage.

Functionality was measured based on the number of application author provided tests that the replacement application continued to pass.

The CQE corpus includes all of the replacement applications, and POVs fielded by each competitor, as well as the score for each submission.

CGC Final Event (CFE)

On August 4th, 2016, DARPA held the world's first all-computer Capture the Flag tournament in Las Vegas. Seven prototype systems squared off against each other and competed for nearly $4 million in prizes in a live network competition. The CGC Final Event took place in conjunction with DEF CON, home of the longest-running annual CTF competition.

CFE started at 08-04-2016 16:00:45 UTC and ended at 08-05-2016 01:13:17 UTC after running 96 rounds ³.

Via an web based api ², competitors fielded an autonomous cyber reasoning system (CRS) that participated in a marshaled Capture The Flag (CTF) competition. The competition was segmented into 96 rounds and was comprised of 82 applications.

Throughout the 96 rounds, applications were introduced and removed from the competition via a pre-determined schedule. Each application was an active for a minimum of 15 rounds.

Similar to CQE, competitors were challenged to remediate vulnerabilities in a suite of applications. In addition to replacing the applications, competitors were additionally provided a snort-like network appliance ¹⁰ which allowed for CRS provided network filters to be applied to the application communication streams. The network appliance provided a read-only network tap to the CRS of the application communication streams.

A unique component to CFE was the inclusion of the concept of consensus evaluation. Each CRS was provided access to each submitted replacement application or network filter one round prior to being fielded. By giving each CRS access to all replacements, competitors were incentivized to develop technology that could withstand analysis and did not rely upon the secrecy of the implementation for efficacy.

Upon the CRS submitting a replacement application or network filter, that application would be marked as offline during the next round. By marking an application down for one round during a replacement, competitors were incentivized to reduce down time which mirrors real world deployment needs.

Each round included 270 seconds of evaluation, followed by at least 30 seconds without evaluation used for scoring. During the evaluation period, the functionality and performance of each application was tested via up to 1050 tests provided by the application developer. Intermingled with the application tests, competitor CRS POVs were executed. By intermingling the POVs and application tests, the CRS must differentiate potential attacks from benign traffic via the network filter tap, which simulates real world challenges of performing network analysis.

POVs in CFE

In CFE, POVs were required to be full CGC applications, following the CGC ABI. In order to prove a vulnerability in CFE, the CRS provided POV could prove a vulnerability in one of two methods ⁴. The first method, referred to as 'type 1', must cause the defended application to fault with the instruction pointer and a general purpose register set to values negotiated at POV execution time. The second method, referred to as 'type 2', must provide the value of 4 contiguous bytes from a negotiated range of memory within the defended application.

Data in the Corpus

The CFE corpus includes all of the replacement applications, network filters, and POVs fielded by each competitor. For each POV that communicated, an execution trace was recorded and rendered via HAXXIS, a visualization tool created for CGC. For each POV that successfully proved a vulnerability, an execution trace generated by the CFE forensic system, following the execution flow for the POV prerequisites.

For visual analysis, graphs are provided that denote the score, impact to memory utilization, execution time, and deployment down time of the teams throughout the competition.

Citations

How to cite

@electronic{CGCCORPUS,
    author = {Caswell, Brian},
    editor = {{Lunge Technology}},
    title = {Cyber Grand Challenge Corpus},
    url = {http://www.lungetech.com/cgc-corpus/},
    urldate = {01.04.2017},
    originalyear = {01.04.2017}
}

References

CGC ABI Specification, Cyber Grand Challenge, July 1, 2016. ↩
CRS Team Interface API, Cyber Grand Challenge, July 1, 2016. ↩
CFE Event Log, Cyber Grand Challenge, August, 7, 2016. ↩
Proof of Vulnerability (POV) in CFE, Cyber Grand Challenge, July 1, 2016. ↩
CQE Scoring Document, DARPA Information Innovation Office, July 7, 2014. ↩
Common Weakness Enumeration, The Mitre Corporation. Accessed April 1, 2017. ↩
FreeBSD INETD manpage, FreeBSD System Manager's Manual, January 12, 2008. ↩
IPC Newsletter, Cyber Grand Challenge, July 1, 2016. ↩
POV DTD, Cyber Grand Challenge, July 1, 2016. ↩
CFE Network Appliance for network defense AoE, Lunge Technology, July 1, 2016. ↩
Submitting a Challenge Binary, Cyber Grand Challenge, July 1, 2016. ↩