Abstraction is a key technology for proving the correctness of computer programs. There are many approaches available, but unfortunately, the various techniques are difficult to combine. Sadly, the successful techniques have to be re-implemented again and again. We address this problem by using the tool CEGAR-PT, which views abstraction as program transformation and integrates different verification components off-the- shelf. The idea is to use existing components without having to change their implementation, while still adjusting the precision of the abstraction using the successful CEGAR approach. The approach is largely general: it only restricts the abstraction to transform, given a precision that defines the level of abstraction, one program into another program. The abstraction by program transformation can over-approximate the data flow (e.g., havoc some variables, use more abstract types) or the control flow (e.g., loop abstraction, slicing). To demonstrate the potential of the above idea, we instantiated the approach for the domain of loop abstraction, which is a control-flow abstraction that replaces a loop in a program to be analyzed by a more abstract representation of the loop. We implemented a modular CEGAR framework that invokes existing loop-abstraction components. The experimental evaluation shows that although modular, the overhead is not large, and using CEGAR-PT, we can verify programs that we could not verify without loop abstraction.
We provide the following content:
To evaluate our approach, we aim to answer the following two research questions:
For conducting our evaluation, we use BenchExec to ensure reliable benchmarking. All benchmarks are performed on machines with an Intel Xeon E5-1230 CPU (4 physical cores with 2 processing units each), 33 GB of RAM, and running Ubuntu 20.04 as operating system. All benchmarks are executed with resources limited to 900 s of CPU time, 15 GB of memory, and 1 physical core (2 processing units). We concentrate on the loops category of the SV-Benchmarks set, which yields a total of 765 programs. These programs mainly contain loops, therefore the used program transformations can be applied to them. We will provide all data and tools in a reproduction package and submit it to the artifact evaluation.
In out experiments we use CPAchecker as a program generator and witness validator, while using several off-the-shelf tools for the verification part, namely, Cbmc, CPAchecker, Symbiotic and UAutomizer. The precision refinement is done by using the verification witness and updating program transformations present on the error path.
Our implementation is available on our public git repository.
In order to evaluate this research question we compare the number of tasks solved successfully by different verifiers inside CEGAR-PT to the number of tasks solved by using the verifiers by themselves. CEGAR-PT was used with no validator and the program transformations presented in the paper. To do this, we compared CEGAR-PT with Coveriteam in order to measure the effectiveness of CEGAR-PT yielding this comparison
As can be seen in using program transformations increased the number of tasks to be solved in all verifiers. Comparing the different methods using the scatter plots, it is possible to see that CEGAR-PT generally only adds a few seconds of additional computation time, which is amortized by solving more tasks overall. This is because verification of the transformed (loop-free) tasks usually succeeds or fails quickly.
While using a modular verification approach allows for easier reuse of program transformations, it may be beneficial to implement them into the control-flow abstraction used by the verifier itself. The proposed benefit may come from a speedup due to not having to initialize the verifier after each refinement. In order to analyze this claim, we compare the run-time between the approach directly modifying the control-flow of the verifier presented in the paper A unifying approach for control-flow-based loop abstraction with our modular approach. Specifically two types of analyses are considered using CPAchecker, predicate analysis and value analysis. Both approaches used the program transformations presented in the paper and exemplified down below. Our approach was used without any validator. The comparison of CEGAR-PT with CPAchecker in order to measure the efficiency of CEGAR-PT yields this comparison
It can be seen that while there is a speed benefit when transforming the control-flow abstraction of the verifier directly, this is usually for the faster tasks taking at most 20 seconds with CPAchecker. For the tasks taking at least 20 seconds we see that the speedup becomes less and less noticeable. Performing the transformation inside the verifier directly does not yield a significant advantage, especially considering the effort required to reimplement the program transformations. Our tool allows for direct source code transformations for which tooling support is widely available, for example AST modification tools, while modifying the control-flow is a largely manual process with little to no tooling support due to each verifier usually using their own internal representation of control-flow.
python3 src/setup.py
Afterwards the experiments using CEGAR-PT can be run using:
./benchmark_local.sh
while the other experiments can be run when changing the xml file used in benchmark_local.sh
.
If you want to run the tool on its own, main_bench.py
can be executed using the following
parameters:
usage: blackbox-cegar [-h] [--version] [--specification specification] [--cache-dir CACHE_DIR] [--no-cache-update] [--force-validation] [--validate-unproven] [--verifier VERIFIER] [--validator VALIDATOR] [--program-generator PROGRAM_GENERATOR] program
positional arguments:
program The program to be verified.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--specification specification The specification for the program to be verified.
--cache-dir CACHE_DIR Path to the cache for coveriteam tools.
--no-cache-update Do not update the cache of the coveriteam tools. Only use tools available in the cache. If tools are not available, then CoVeriTeam simply fails.
--force-validation Force the validation of verdicts even when unneeded. This should only be used to download tools, since it will in geenral cause errors which cannot be fixed due to the tools not getting the elemetns they need.
--validate-unproven Validate the verdicts of strategies which have not been proven to be underapproximating or overapproximating.
--verifier VERIFIER The verifier to be used for the experiments, default is CPAchecker.
--validator VALIDATOR The validator to be used for the experiments, default is CPAchecker.
--program-generator PROGRAM_GENERATOR The program generator to be used for the experiments, default is ErrorPathRefinementProgramGenerator.
--debug All the intermediate outputs are not deleted and can be analyzed
Validators, verifiers and program generators are given using their python class names.
while (y < 10000) {
y += 1;
}
y = nondet();
if (!(y < 10000)) {
exit(0);
}
while (y < 10000) {
y += 1;
}
y = nondet();
if (y < 10000) {
y += 1;
if (!(y < 10000)) {
exit(0);
}
} else {
exit(0);
}
while (y < 10000) {
y += 1;
}
y = nondet();
for (long long s = 0; s < 1; s++) {
y += 1;
}
if (!(y < 10000)) {
exit(0);
}
while (y < 10000) {
y += 1;
}
y = 10000;
int main() {
int y = nondet();
if (y < 100) {
while (y < 10000) {
y += 1;
}
assert(y == 10000);
}
}
With the CEGAR-PT setup from the experiments described in the paper, the first transformed program uses the
Havoc strategy
and results in:
int main() {
int y = nondet();
if (y < 100) {
y = nondet();
if (!(y < 10000)) {
exit(0);
}
assert(y == 10000);
}
}
Verifying this program, we get the verdict False
meaning that a counterexample has been found.
Since
the transformed program over-approximates the program behavior, one strategy is updated. Since there is only a
single strategy on the error path, it is updated to now use Naive Loop Acceleration strategy
. Using
this strategy the next transformed program is:
int main() {
int y = nondet();
if (y < 100) {
y = nondet();
if (y < 10000) {
y += 1;
if (!(y < 10000)) {
exit(0);
}
} else {
exit(0);
}
assert(y == 10000);
}
}
The same process, as explained before is repeated, while updating this strategy and going through the strategy
Output Loop Acceleration
. Finally, we reach the Constant Extrapolation Strategy
which
results in the following program:
int main() {
int y = nondet();
if (y < 100) {
y = 10000;
assert(y == 10000);
}
}
It returns the verdict True
i.e. a proof, which can be returned as the verdict of the original
program
since the strategy used has the same program behavior as the original program.
int main() {
int y = nondet();
if (y < 100) {
while (y < 10000) {
y += 1;
}
assert(y == 10000);
} else {
while (y > 0) {
y -= 1;
}
}
}
Applying the Havoc Strategy
to both loops results in:
int main() {
int y = nondet();
if (y < 100) {
// START HAVOCSTRATEGY
if (y < (10000)) {
y = nondet();
}
if (y < (10000)) abort();
// END HAVOCSTRATEGY
assert(y == 10000);
} else {
// START HAVOCSTRATEGY
if (y > (0)) {
y = nondet();
}
if (y > (0)) abort();
// END HAVOCSTRATEGY
}
}
Verification of the program with the havoc strategy applied produces this
witness,
which encodes the error path found by the verifier as automaton. When we remove all the nodes corresponding to
lines not
present in the original program, this results in the following witness.
Validating the modified witness on the original program does not validate the verdict on the original program.
Therefore, we need to find out which strategy should be refined.
For this we see that these lines of code [22, 20, 18, 17, 15, 14, 13, 7, 6]
are
present in the original witness. Therefore, we see that the strategy which should be refined is the one being
applied for the first loop, since there is an overlap with its lines of code on the patched program and those
lines of code present in the witness.
Updating the strategy being used result in the following program.
int main() {
int y = nondet();
if (y < 100) {
// START NAIVELOOPACCELERATION
if (y < (10000)) {
y = nondet();
if (!(y < (10000))) abort();
if (y < 10000) {
y += 1;
}
if (y < (10000)) abort();
}
// END NAIVELOOPACCELERATION
assert(y == 10000);
} else {
// START HAVOCSTRATEGY
if (y > (0)) {
y = nondet();
}
if (y > (0)) abort();
// END HAVOCSTRATEGY
}
}
More details are available in our research paper.