opencrs_dataset is the dataset resulted by running OpenCRS's dataset module with all integrated test suites. It contains 54586 vulnerable ELF executables, compiled from C sources and targetting the 32-bit i386 architecture.
opencrs_dataset                             root folder
βββ executables                             folder with all executables
β   βββ ...
βββ index.csv                               labels for exact executable, with
|                                           parent dataset and its CWEs,
|                                           eventually separated by commas
βββ README.md                               this file
| Identifier | Test Suite Name | Creator | Initial Sources Count | Final Executables Count | 
|---|---|---|---|---|
| nist_juliet | Juliet C/C++ 1.3 | National Security Agency's Center for Assured Software. National Institute of Standards and Technology | 64123 | 54531 | 
| nist_c_test_suite | C Test Suite for Source Code Analyzer v2 - Vulnerable | Alexander Hoole. National Institute of Standards and Technology | 54 | 50 | 
| toy_test_suite | Toy Test Suite | OpenCRS | 5 | 5 | 
| Weakness | Count | 
|---|---|
| Stack-based Buffer Overflow | 13834 | 
| Heap-based Buffer Overflow | 11088 | 
| Integer Overflow or Wraparound | 3960 | 
| Mismatched Memory Management Routines | 3564 | 
| Integer Underflow | 2952 | 
| Free of Memory not on the Heap | 2680 | 
| Use of Externally-Controlled Format String | 2407 | 
| Buffer Underflow | 2048 | 
| Buffer Under-read | 2048 | 
| OS Command Injection | 1921 | 
The columns present in the index.csv file are the following:
- name: Unique identifier of a vulnerable program. It is used to determine the executable file path, namely by using the format- executables/<name>.elf;
- cwes: One or more CWEs that are present in the executable; and
- parent_dataset: Parent dataset's identifier.
- Set up OpenCRS's datasetmodule on an Ubuntu 20.04 host by following the guide.
- Build each test suite (identified by <test_suite_id>):poetry run dataset build --testsuite <test_suite_id>.
- The executables in this repository, under the executablesfolder, are those fromdataset'sexecutables. The same relation applies forindex.csv, which isdataset'svulnerables.csvwithout the last column.