Testing is a widely applied technique to evaluate software quality and coverage criteria are often used to assess the adequacy of a generated test suite. However, manually constructing an adequate test suite
is typically too expensive and numerous techniques for automatic test-suite generation were proposed. All of them come with different strengths. To build stronger test-generation tools, different techniques should be combined. In this paper, we study cooperative combinations of verification approaches for test generation, which exchange high-level information.
We present CoVeriTest, a hybrid technique for test-suite generation. CoVeriTest iteratively applies different conditional model checkers and allows users to adjust the level of cooperation and to configure individual time limits for each conditional model checker. In our experiments, we systematically study different CoVeriTest configurations, which either use combinations of explicit-state model checking and predicate abstraction or bounded model checking and symbolic execution. A comparison with state-of-the-art test-generation tools, reveals that CoVeriTest achieves higher coverage for some programs.
We performed four types of experiments:
In the following, we provide the raw data, HTML tables, and figures for our experiments. Note that we do not provide the logfiles, CSV files, nor the tests generated by the third experiment. Even compressed, this data is several gigabytes large and, therefore, can only be found in our artifact.
A CoVeriTest configuration consists of
In our experiments, we consider two combinations of analyses. On the one hand, we combine a value analysis (similar to explicit model checking) and a predicate analysis. On the other hand, we combine bounded model checking (BMC) with symbolic execution. Additionally, we used six different time limit configuration. Four of them are uniform and trigger analysis switches often, sometimes, or rarely. The other two are non-uniform, each of them preferring one of the two combined analyses. Furthermore, we used up to twelve different reuse types ranging from no exchange, over only reuse own results to full cooperation. For the details on the different reuse types, we refer to our paper. All configurations are realized in the software analysis framework CPAchecker.
For the comparison of the different CoVeriTest configurations, we used all 7644 programs from the largest publicly available benchmark collection of C verification tasks and all configuration aimed at the coverage of the branches in the verification tasks. Since all configurations are run by the same tool CPAchecker, we decided to disable test-case generation (thus, reducing the amount of data) for the comparison of the different configurations and only compare the number of covered test goals reported by CPAchecker.
In the following, we first present the results of the experiments comparing the configurations of the value and predicate analysis combinations. Thereafter, we come to the results of the experiments studying combinations of BMC and symbolic execution.
CoVeriTest with Value and Predicate Analysis
The following table presents the raw data and the HTML tables for the combinations using value and predicate analysis. CSV files and logfiles are provided with our artifact. The table provides for each combination of reuse type and time limit pair the respective raw data (compressed XML file). Additionally, it contains one HTML for each reuse type and one for each time limit pair. The HTML files for the reuse types show for each reuse type and test task (program) the experimental data for the different time limits, i.e., configurations using the particular reuse type with varying time limits. Similarly, the HTML files for the time limit pairs provide for each time limit pair and test task (program) the experimental data for the different reuse types.
The above table lets one compare the coverage results individually per test task (program). To better compare the different configurations, we also provide aggregated results for the different configurations. More concrete, we show the distribution of the achieved coverage. Instead of looking into total coverage, which is less robust against a large number of unreachable test goals, we decided to consider a relative measure (called relative coverage) that compares against the best value achieved. To compare a set of configurations, we first compute for each program (test task) the largest number of covered test goals reported among all configurations in the set and then use these values to compute the relative coverage by dividing for each test task and configuration the number of test goals achieved by this largest number.
The following twelve figures show for each of the twelve reuse types six box plots. Each box plots shows the distribution of the relative coverage per time limit configuration. Looking at the figures, we observe that for each reuse type the configuration using time limit pair (20,80), i.e., preferring the predicate analysis, performs best.
plain | reuse-prec | reuse-arg |
cond-v | cond-p | cond |
cond-v+r | cond-p+r | cond+r |
trans-prec | trans-prec+r | trans-prec-cond+r |
The following six figures show for each of the six time limit pairs twelve box plots. Each box plots shows the distribution of the relative coverage of one reuse type. Looking at the figures, we observe that for each time limit reuse types reuse-prec, reuse-arg, cond-p+r, and vars+r always perform better than the remaining ones. The analyses should definitely reuse their own information and the predicate analysis should not use information from the value analysis.
10×10 | 50×50 |
100×100 | 250×250 |
80×20 | 20×80 |
Next, we want to find out what is the best configuration for CoVeriTest combining value and predicate analysis. Based on the previous results, we therefore chose for each reuse type the best time limit pair and compared the different reuse types. Additionally, we chose for each time limit the best reuse type and compared the different time limits. We again consider the distribution of the relative coverage for comparison. The following two figures show the box plots for the comparison. Below each figure we also add the HTML tables allowing a per task comparison of the configurations. The raw data is contained in the above table.
Reuse Types | Time Limits |
HTML Table | HTML Table |
From the figures above, it is only clear that the best CoVeriTest configuration using a combination of value and predicate analysis uses a time limit of 20×80. However, it is not obvious whether reuse type reuse-arg or cond-p+r is better. Looking into our raw data, we found out that the best configuration uses the reuse type reuse-arg.
CoVeriTest with BMC and Symbolic Execution
The following table presents the raw data and the HTML tables for the combinations using BMC and symbolic execution. CSV files and logfiles are provided with our artifact. The table provides for each combination of reuse type and time limit pair the respective raw data (compressed XML file).
Additionally, the table contains one HTML for each reuse type and one for each time limit pair. The HTML files for the reuse types show for each reuse type and test task (program) the experimental data for the different time limits, i.e., configurations using the particular reuse type with varying time limits. Similarly, the HTML files for the time limit pairs provide for each time limit pair and test task (program) the experimental data for the different reuse types.
Note that the combination of BMC and symbolic execution only supports nine of the twelve reuse types.
Time Limits | ||||||||
---|---|---|---|---|---|---|---|---|
Reuse Type | 10×10 | 50×50 | 100×100 | 250×250 | 80×20 | 20×80 | All Limits | |
plain | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
reuse-prec | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
reuse-arg | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond-b | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond-s | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond-b+r | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond-s+r | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
cond+r | Result XML | Result XML | Result XML | Result XML | Result XML | Result XML | HTML Table | |
All Types | HTML Table | HTML Table | HTML Table | HTML Table | HTML Table | HTML Table |
The following nine figures show for each of the nine, compatible reuse types six box plots. Each box plots shows the distribution of the relative coverage per time limit configuration. Looking at the figures, we observe that for each reuse type the configuration using time limit pair (250,250), i.e., a configuration avoiding context switches, performs well, but not always best.
plain | reuse-prec | reuse-arg |
cond-b | cond-s | cond |
cond-b+r | cond-s+r | cond+r |
The following six figures show for each of the six time limit pairs nine box plots. Each box plots shows the distribution of the relative coverage of one reuse type.
10×10 | 50×50 |
100×100 | 250×250 |
80×20 | 20×80 |
Next, we want to find out what is the best configuration for CoVeriTest combining BMC and symbolic execution. Based on the previous results, we therefore chose for each reuse type the best time limit pair and compared the different reuse types. Additionally, we chose for each time limit the best reuse type and compared the different time limits. We again consider the distribution of the relative coverage for comparison. The following two figures show the box plots for the comparison. Below each figure we also add the HTML tables allowing a per task comparison of the configurations. The raw data is contained in the above table.
Reuse Types | Time Limits |
HTML Table | HTML Table |
The result of the comparison is that the best CoVeriTest configuration for a combination of BMC and symbolic execution is reuse type cond-s+r and time limit 80×20.
Best CoVeriTest Configuration
At last, we want to find out which of our CoVeriTest configuration is best. To this end, we compare the best configurations for our two analysis combinations. To compare the different configurations, we decided to use a scatter plot that contains one point per test tasks. Each point (x,y) describes the coverage x achieved by the best CoVeriTest configuration using BMC+symbolic execution and the coverage y achieved by the best CoVeriTest configuration using value plus predicate analysis. We also provide an HTML table comparing the two configurations on a per task basis. The raw data of the two configurations is not referenced and can be looked up in the two tables above.
HTML Table |
Looking at the scatter plot, we observe that CoVeriTest with value and predicate analysis often outperforms the combination with BMC and symbolic execution.
To investigate if CoVeriTest's interleaving is beneficial, we compare the best CoVeriTest configuration per analysis combination ((reuse-arg, 20×80) for value plus predicate analysis and (cond-s+r, 80×20) for BMC plus symbolic execution) with the two component analyses and a parallel execution of both analyses. For the comparison, we use the same set of test tasks as in the comparison of the different configurations and also do not generate tests, but only compare the number of covered test goals. As before, we first present the results for the combination of value and predicate analysis.
The following table presents the raw data (compressed XML files) of the four experiments that belong to the combination of value and predicate analysis. Additionally, the table contains an HTML table file for the experiment.
Analysis Type | Data |
---|---|
Parallel Analysis (Value+Predicate) | Result XML |
Value Analysis | Result XML |
Predicate Analysis | Result XML |
CoVeriTest (Value+Predicate) | Result XML |
All | HTML |
The above table lets one compare the coverage results individually per test task (program). To compare the different configurations, we need to aggregate the results on all test tasks. We decided to use a scatter plot that contains one point per test tasks. Each point (x,y) describes the coverage x achieved by CoVeriTest and the coverage y achieved by the other analysis. Looking at the scatter plots, we observe that CoVeriTest outperforms the value analysis. In the other two cases, it is not that obvious. CoVeriTest is beneficial for certain program classes, but not always best.
Comparison of value analysis and CoVeriTest | Comparison of predicate analysis and CoVeriTest | Comparing parallel combination of value and predicate analysis with CoVeriTest |
The following table presents the raw data (compressed XML files) of the four experiments that belong to the combination of BMC and symbolic execution. Additionally, the table contains an HTML table file for the experiment.
Analysis Type | Data |
---|---|
BMC | Result XML |
Symbolic Execution | Result XML |
Parallel Analysis (BMC+Symbolic Execution) | Result XML |
CoVeriTest (BMC+Symbolic Execution) | Result XML |
All | HTML |
To better compare the different configurations, we also provide scatter plots for the BMC plus symbolic execution part of the experiment. Again, each point (x,y) in the scatter plot describes the coverage x achieved by CoVeriTest and the coverage y achieved by the other analysis. Looking at the scatter plots, we observe that CoVeriTest often outperforms BMC, symbolic execution, and their parallel combination.
Comparison of BMC and CoVeriTest | Comparison of symbolic execution and CoVeriTest | Comparing parallel combination of BMC and symbolic execution with CoVeriTest |
To study when and whether CoVeriTest improves over state-of-the-art test-case generation tools, we compare the branch coverage achieved by CoVeriTest's best configuration (reuse-arg,20×80) with the branch coverage achieved by the best two participants of the International Competition on Software Testing (Test-Comp'19). For our comparison, we used all 1720 programs from Test-Comp'19 in the category Code Coverage. Furthermore, we no longer cannot use the reported number of covered goals for comparison, but had to (1) generate test-cases with CoVeriTest and (2) let the Test-Comp validator measure branch coverage. The following table links to the raw data of our test-case generation and subsequent validation as well as the raw data from Test-Comp'19.
Experiment | Raw Data |
---|---|
CoVeriTest Test Generation | Result XML |
CoVeriTest Test Validation | Result XML |
Test-Comp | Raw Data |
To better compare the actual results, we show two scatter plots. Each scatter plots compares the branch coverage achieved by CoVeriTest against the branch coverage achieved by one Test-Comp participant. Looking at the scatter plots, we observe that CoVeriTest neither dominates nor is dominated by any of the approaches. Thus, it complements existing approaches
Comparison of CoVeriTest and VeriFuzz | Comparing of CoVeriTest and KLEE |
BASEDIR ├── benchmark_defs ├── CPAchecker ├── debs ├── experiments │ ├── data │ ├── figures │ ├── scripts │ ├── tables ├── README ├── sv-benchmarks ├── tbf-testsuite-validator
Install the required software dependencies from the folder debs and set up BenchExec for <USER> using the following commands.
cd $BASEDIR/debs sudo dpkg -i *.deb sudo adduser <USER> benchexec reboot
Before we explain how to repeat our results, we provide a short tutorial explaining how to run the different CoVeriTest configurations used in our experiments. In our tutorial, we execute all CoVeriTest configurations on the example program test_locks_5.c. Furthermore, the tutorial assumes that the run commands are executed from $BASEDIR/CPAchecker.
In the following, we show the run commands to execute the different CoVeriTest configurations on the example program test_locks_5.c. All presented run commands are structured in the same way (elements 1, 2, and 5 are even identical in all commands):
Note that program test_locks_5.c assumes a 32-bit architecture. To generate tests for programs with a 64-bit architecture add the parameter -64 to the command line. Furthermore, we disabled the output of test cases. To enable test-case generation add the parameter -setprop testcase.xml=testcase%d.xml. The test-cases are written in the Test-Comp XML exchange format to the folder $BASEDIR/CPAchecker/output. -->
Next, we list for each of the twelve reuse types the command lines required to execute the CoVeriTest configurations that uses this reuse type, a time limit 10×10 (i.e., each of the two analyses is granted 10 s per iteration), and either a combination of value and predicate analysis or BMC and symbolic execution. To use a different time limit pair (x,y), one must adapt the command line string as follows: replace the first occurrence of _10 by _x and the second by _y.
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-valueAnalysis.properties::noreuse_10,\ config/testCaseGeneration-predicateAnalysis.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-bmc.properties::noreuse_10,\ config/testCaseGeneration-symbolicExecution-Cegar-CEX.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-generate-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-predicate-use-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cValue+Predicate (A=Predicate)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-use-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-predicate-generate-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (A=BMC)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-generate-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-symbolic-use-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (A=Symbolic Execution)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-use-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-symbolic-generate-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-generate-and-use-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-predicate-generate-and-use-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-generate-and-use-cmc-condition.properties::noreuse_10,\ config/components/testing/testCaseGeneration-symbolic-generate-and-use-cmc-condition.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-valueAnalysis.properties::reuse-cpa-own-precision_10,\ config/testCaseGeneration-predicateAnalysis.properties::reuse-cpa-own-precision_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-bmc.properties::reuse-cpa-own-precision_10,\ config/testCaseGeneration-symbolicExecution-Cegar-CEX.properties::reuse-cpa-own-precision_10 ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-valueAnalysis.properties::continue_10,\ config/testCaseGeneration-predicateAnalysis.properties::continue_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-bmc.properties::continue_10,\ config/testCaseGeneration-symbolicExecution-Cegar-CEX.properties::continue_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-generate-cmc-condition.properties::continue_10,\ config/components/testing/testCaseGeneration-predicate-use-cmc-condition-precision.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cValue+Predicate (A=Predicate)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-use-cmc-condition.properties::reuse-own-precision_10,\ config/components/testing/testCaseGeneration-predicate-generate-cmc-condition.properties::continue_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (A=BMC)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setpropinterleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-generate-cmc-condition.properties::continue_10,\ config/components/testing/testCaseGeneration-symbolic-use-cmc-condition.properties::reuse-own-precision_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (A=Symbolic Execution)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-use-cmc-condition.properties::reuse-own-precision_10,\ config/components/testing/testCaseGeneration-symbolic-generate-cmc-condition.properties::continue_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-generate-and-use-cmc-condition.properties::reuse-own-precision_10,\ config/components/testing/testCaseGeneration-predicate-generate-and-use-cmc-condition-precision.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-bmc+symbolic-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-bmc-generate-and-use-cmc-condition.properties::reuse-own-precision_10,\ config/components/testing/testCaseGeneration-symbolic-generate-and-use-cmc-condition.properties::reuse-own-precision_10 \ ../sv-benchmarks/c/locks/test_locks_5.c
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-valueAnalysis.properties::reuse-pred-precision_10,\ config/testCaseGeneration-predicateAnalysis.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (not supported)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate \ -setprop interleavedAlgorithm.configFiles=config/testCaseGeneration-valueAnalysis.properties::reuse-cpa-precisions_10,\ config/testCaseGeneration-predicateAnalysis.properties::continue_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (not supported)
scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats \ -spec config/specification/sv-comp-terminatingfunctions.spc \ -testCaseGeneration-interleaved-value+predicate-cmc \ -setprop interleavedAlgorithm.configFiles=config/components/testing/testCaseGeneration-value-generate-and-use-cmc-condition.properties::reuse-precisions_10,\ config/components/testing/testCaseGeneration-predicate-generate-and-use-cmc-condition-precision.properties::noreuse_10 \ ../sv-benchmarks/c/locks/test_locks_5.cBMC+Symbolic Execution (not supported)
As already explained above, we performed three types of experiments: internal comparison of CoVeriTest configurations, comparison of CoVeriTest against its components, and comparison of CoVeriTest against Test-Comp participants. In the following, we explain how to repeat the experiments of each of the three types. If you system has swap memory, you need to turn it off.
sudo swapoff -aYou can later turn it on again executing the following command.
sudo swapon -a
To replicate our results, you need 8 CPU cores and 15 GB of memory.
The results of the experiments can be found in the sub-folder results. The sub-folder results is a subfolder of the location from where you started the respective benchexec command, i.e., either $BASE/CPAchecker or $BASE/validation.
To run the configurations with time limit pair 10×10, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_10_10.xml
To run the configurations with time limit pair 50×50, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_50_50.xml
To run the configurations with time limit pair 100×100, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_100_100.xml
To run the configurations with time limit pair 250×250, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_250_250.xml
To run the configurations with time limit pair 80×20, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_80_20.xml
To run the configurations with time limit pair 20×80, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-value+predicate_20_80.xml
To run the configurations with time limit pair 10×10, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_10_10.xml
To run the configurations with time limit pair 50×50, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_50_50.xml
To run the configurations with time limit pair 10×10, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_100_100.xml
To run the configurations with time limit pair 250×250, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_250_250.xml
To run the configurations with time limit pair 80×20, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_80_20.xml
To run the configurations with time limit pair 20×80, use the following commands.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_20_80.xml
To compare the best CoVeriTest configuration of each analysis combination with the single analyses used by CoVeriTest as well as the parallel execution of these analyses, we need to run six analyses: value analysis, predicate analysis, parallel execution of value and predicate analysis, BMC, symbolic execution and, parallel execution of BMC and symbolic execution. To regenerate the experimental data for these three analyses, execute the following commands. For CoVeriTest, we reuse the experimental data from the first experiment.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/test-gen-single+parallel.xml
To compare our best CoVeriTest configuration with the tools participating in the International Competition on Software Testing (Test-Comp'19), we first let the best CoVeriTest configuration (reuse-arg,20×80) generate test-cases for the test tasks of Test-Comp'19 that support the property coverage-branches.prp. Thereafter, we use the Test-Comp validator to measure for each task the branch coverage achieved by CoVeriTest on that task.
cd $BASE/CPAchecker benchexec --container ../benchmark_defs/CoVeriTest.xml
cd $BASE/tbf-testsuite-validator benchexec --container ../benchmark_defs/CoVeriTest-val.xmlIf you want to use the test-cases you generated with the commands in step one, you need to properly adjust the paths to test cases of your benchmark. To this end, adapt the text ending on test.zip in the option and requiredfiles tags of the benchmark file CoVeriTest-val.xml. Thereafter, you can use the above command.
We compared the produced coverage values with the results of the test-case generation tools participating in Test-Comp'19. Note that this comparison only makes sense because we used the same resource limits and infrastructure as Test-Comp'19.