Scoring README

This document describes the layout of the CQE_Scores.csv file. Each record (line) in CQE_Scores.csv consists of the following 14 columns: TeamName, cset, package_name, CB_Score, AvailabilityScore, FuncScore, PerfScore, ExecTimeOverhead, FileSizeOverhead, MemUseOverhead, SecurityScore, ReferenceScore, ConsensusScore, EvaluationScore. Details on the scoring algorithm can be found at: https://github.com/CyberGrandCallenge/cgc-release-documentation/blob/master/CQE Scoring.pdf. Also note where applicable the column names are followed by the attribute name from the scoring documentation, linked above. These columns are broken down as follows:

Columns 1-3:

  1. TeamName - the competitor ID. Submissions from finalists are identified as, e.g. 'First Place - FINALIST'; all other teams are collectively listed as 'anon'.
  2. cset - the common name of the challenge set, which contains the author ID and a sequence number.
  3. package_name - name of the submission file uploaded to S3 (in the ${csid}_${ar_hash}.ar format, where ${csid} is the anonymous cset identifier, and ${ar_hash} is SHA-256 hash of the submission archive).

Column 4:

  1. CB Score - the final score out of a maximum of 4 awarded to the competitor for the specific submission. CB Score is comprised of Availability Score (5) x Security Score (11) X Evaluation Score (14). Details of the scoring algorithm can be found on github, see link above.

Column 5:

  1. Availability Score - the total availability score that is used to calculate the total score from (4) above. Total Availability Score is the minimum of PerfScore (7) and FuncScore (6).

Column 6:

  1. FuncScore - the fraction of test cases passed by the Replacement CB over the total, amortized by the step function as described in the scoring document listed above, and used for the availability calculation in (5) above.

Columns 7-10:

  1. PerfScore- the total performance score, which is the worst negative impact of ExecTimeOverhead (8), FileSizeOverhead (9), and MemUseOverhead (10).
  2. ExecTimeOverhead - the fraction of execution time overhead of the Replacement CB as compared to the Reference Patched CB. ExecTimeOverhead < 10% receives full score. Details can be found in the scoring document listed above.
  3. FileSizeOverhead - the fraction of file size overhead of the Replacement CB as compared to the Reference Patched CB. FileSizeOverhead < 40% receives full score.
  4. MemUseOverhead - the fraction of memory useage overhead of the Replacement CB as compared to the Reference Patched CB. MemUseOverhead < 10% receives full score.

Columns 11-13:

  1. Security Score - the total security score. Security Score goes to 0 if ReferenceScore = 0, otherwise it is 1 + 0.5 x (ReferenceScore (12) + ConsensusScore (13))
  2. ReferenceScore - the security reference, the fraction of reference PoVs that prove vulnerability in the Replacement CB as compared to the Reference Patched CB.
  3. ConsensusScore - a binary value (0,1). 0 if Any submitted PoV proves vulnerability in the Replacement CB, and 1 if No submitted PoV proves vulnerability.

Column 14:

  1. Evaluation Score - a binary value (1,2). 1 if the submitted PoV does not prove vulnerability in the reference CB, and 2 if the submitted PoV does prove vulnerability in the reference CB.