CoVeriTeam: On-Demand Composition of Cooperative Verification Systems

Dirk Beyer and Sudeep Kanav
LMU Munich, Germany

Abstract

CoVeriTeam is a tool for on-demand composition of cooperative verification tools. Software verification is a hard and important problem, and many approaches with different strengths exist. Thus, it is essential to combine the strengths of many verification tools via combinations and cooperation. CoVeriTeam provides a systematic way to combine existing verification tools in order to not miss opportunities.

The concept is based on verification artifacts (programs, specifications, witnesses) as basic objects, verification actors (verifiers, validators, testers) as basic operations, and defines composition operators that make it possible to easily describe new compositions, taking verification artifacts as interface between the verification actors. CoVeriTeam consists of a language for composition of verification actors; and its interpreter.

As a result of viewing tools as components, we can now create powerful verification engines that are beyond the possibilities of single tools, avoiding to develop certain components repeatedly. We illustrate the abilities of CoVeriTeam on a few case studies. We expect that CoVeriTeam will help a verification researcher to easily experiment with new tools, and assist her in rapid prototyping various combinations of verification tools.

CoVeriTeam is open source and available as GitLab project at: https://gitlab.com/sosy-lab/software/coveriteam

This page contains the supplementary material for the article CoVeriTeam: On-Demand Composition of Cooperative Verification Systems.

Performance Evaluation Experiment
Results: Plots of the Measurements
Results: HTML Tables with the Measurements
Archive Containing the Results
Discussion of Results

Performance Evaluation Experiment

The aim of the experiment is to find out the overhead induced by CoVeriTeam.

In order to accurately measure the overhead, we identified and eliminated the factors that might influence the measurements (confounding factors). The factors are as follows:

Tool: We created an actor of the type verifier based on the utility false that is provided in Ubuntu. This takes negligible time to execute and always produces the verdict FALSE. This made the inner tool execute in constant time, removing any influence from the tool that is executed (virtual machine, loading time).
Input: We ran the execution 10000 times. The input file is small (33 bytes) and always the same (C program with an empty main function). This removes variance caused by the size of input.
Containers: CoVeriTeam loads the tool-info modules in a container. We ran our experiments with both configurations to see if this has any effects. The standard configuration is robust and loads the tool-info module in a container, and the trusted configuration trusts the tool-info module and does not load it in a container.

Infrastructure

Our experiments were executed on machines with the following configuration: one 3.4 GHz CPU (Intel Xeon E3-1230 v5), 8 processing units, 33 GB RAM, and running the operating system Ubuntu 20.04. Each verification task was limited to 8 CPU cores, a CPU time of 15 min, and a memory usage of 15 GB.

Resource Measurements

Since in our experiment we are executing a tool that takes negligible and constant time, we only consider the time taken by CoVeriTeam.

To measure the overhead of CoVeriTeam, we measure the full resource consumption of each execution of CoVeriTeam including the executed tool.

Results: Plots of the Measurements

Fig.1: Box plot for CoVeriTeam overhead for time

Fig.2: Box plot for CoVeriTeam overhead for memory

Fig.3: Quantile plots for CoVeriTeam overhead for time

Fig.4: Quantile plots for CoVeriTeam overhead for memory

Results: HTML Tables with the Measurements

The following table was produced by the table generator of BenchExec. The resource measurements can be viewed in more detail as HTML table. The HTML tables also support interactive plots; please select Quantile Plot in the Plot drop-down list, and All in the Results drop-down list.

Archive Containing the Results

The supplementary archive contains the XMLs for the resource measurements produced by BenchExec, script to generate the inner resource measurement consumed by the tool, script to generate CSV files from which the plots were generated, and the script to generate the plots from the CSV file. The archive contains:

./gen-results.sh: Script to generate the plots. It first generates a CSV file from the raw measurements, and then calls the script ./plots/makeplots.py to generate the plots.
./input_data/:This directory the input on which we executed the experiments. This directory further contains:
- ./benchdefs/: This directory contains the benchmark definition files that are input to BenchExec.
- ./createtasks.sh: This script creates 10000 copies of the verification task verificationtasks/simple.yml.
- ./unreach-call.prp: The property file for the verifier used for the experiments.
- ./verificationtasks: This directory contains the verification tasks. It contains the file “simple.c” with an empty main function, and a verification task files.
./plots/: This directory containing the CSV data, and the plots.
./robust/: This directory contains the resource measurement XML for the robust configuration.
./trusted/: This directory contains the resource measurement XML for the trusted configuration.

Discussion of Results

The plots show that the overhead varies much more in the robust configuration as compared to the trusted. Also, we notice that the time overhead varies much more as compared to the memory overhead.

We have discussed the results with the BenchExec developers since we use its features to execute the tool (tool-info modules are part of the BenchExec repository). The default (robust) configuration executes both the tool and the tool-info module in two separate containers. For communicating with the tool-info module, BenchExec uses the standard Python multiprocessing package, which adds some additional overhead. We see more variance for time overhead in the robust configuration due to this reason.

We recommend the experimental user to use the robust configuration. A user concerned about the performance should use the trusted configuration. Trusted configuration should only be used when the user has enough information about the tool-info modules to trust them to not execute anything suspicious while assembling the command-line for the tool execution.