In this Google Summer of Code project, we implemented the CFA reverser in the CPAchecker. Moreover, we benchmarked our algorithm on several program sets, to evaluate its function and efficiency.
To reverse the CFA, the CFA Reverser utilizes a builder class called CfaBuilder
. This class creates a CfaFunctionBuilder
for each function's CFA, then the CfaFunctionBuilder
is responsible for creating the reverse CFA for each function's CFA.Once each function's reverse CFA is created, the CFASecondPassBuilder
will be invoked. This will create the necessary call edges and return edges that connect each function's CFA, based on the function call edge in each CFA.
Our implementation of CfaFunctionBuilder
uses a Breadth-first search (BFS) algorithm starting from the exit point for each CFA. The corresponding CFA node in the reverse CFA is the function entry point. Then we insert or append the new edge, corresponding to the original CFA. For most of the assignment statements, its reverse edge is the assumption edge which condition is the equality of the two sides.
Moreover, we supposed a method to reverse augmented assignment operations like
a += 1;
This type of operation can be generalized to cases where the right side of the assignment statement contains the subexpressions located on the left side. For example, the right side may be a function call whose arguments contain the expression in the left side expression.
x = f(x);
The method utilizes two types of visitor: the first type collect the possible subexpression on the left side, and the second type creates placeholders for those expressions. For example, consider the above example, then the reverse program should be
assume (a == tmp_a + 1);
a = tmp_a
This CFA could be converted into a SMT formula which is equivalent to the original SMT formula. This method provides a relatively universal solution since it could be applied to any kind of assignment statement.
In addition, the non-det variable is used to determine the branch for programs that include branching. By this method, for programs that include loops, their CFAs are inverted to infinite-loop programs. As mentioned above, in such cases, a non-det variable determines whether or not to jump out of the loop. For example, consider a loop program
int a = 1;
for (int i = 1; i <= 10; i++) a *= 2;
if (a != 1024) {
reach_error();
}
should be reversed as
// variable initialization omitted
assume(a != 1024);
assume(i > 10);
while (1) {
assume(i == tmp_i + 1);
i = tmp_i;
tmp_i = __VERIFIER_nondet_int();
assume(a == tmp_a * 2);
a = tmp_a;
tmp_a = __VERIFIER_nondet_int();
assume(i <= 10);
b = __VERIFIER_nondet_int();
if (b == 0) continue;
if (b == 1) break;
}
assume(i == 1);
assume(a == 1);
reach_error();
For function calls, since passing function arguments and return variables are similar to what assignment statements do, we also utilized the assumption edge to control the generated SMT formula. Consider the following program
int add(int x, int y) {
return x + y;
}
c = add(a, b);
could be reversed into a program
void reversed_add(int x, int y, int ret) {
assume(x + y == ret);
x_local = __VERIFIER_nondet_int();
y_local = __VERIFIER_nondet_int();
assume(x_local == x);
assume(y_local == y);
}
reversed_add(a, b, c);
Note that this method requires to modify each function's declaration to reverse it. Adding the left side of the function call assignment statement allows us to check the return variable's value.
After reversing each CFA, we insert a dummy edge connecting the function entry point to the error target. This technique enables us to effectively bypass redundant paths that could otherwise interfere with the outcome.
Nonetheless, there is a limitation of this approach: it may not yield reliable results for programs with paths that involve multiple targets.
for (int i = 0; i < 10; i++) {
if (a[i] != b[i]) {
reach_error();
}
}
Another limitation of this method is that it can not work well with the function call. Consider a program
void __VERIFIER_assert(int cond) {
if (!(cond)) {
reach_error();
}
return;
}
__VERIFIER_assert(i != 1);
If we directly jump to reach_error()
, some restrictions on the function arguments won't be imported into the paths. Since __VERIFIER_assert()
is used widely in the SV-comp benchmark, we partially resolve this issue by setting the caller of functions containing the error target, as the target. It is after all a workaround, which requires a more general approach.
The theory is relatively straightforward, but the implementation itself is much more difficult than we thought. The main reason is that CFA itself is at a relatively high abstraction layer, which leads to its complexity.
The main work is at the Merge Request: MR #113.
The main implementation of the CFA Reverser is at CFAReverser.java.
The example configuration file can be found at reverseCFA.properties.
The hand-crafted tests can be found at test/programs/reverse_cfa.
There are still some unsupported language features.
For example:
union
type.Furthermore, there is also some unresolved issue:
We executed several analyses with CFA Reverser on a subset of SVComp's benchmark. The subset consists of 4 test sets: ReachSafety-Arrays
, ReachSafety-BitVectors
, ReachSafety-ControlFlow
, and ReachSafety-Loops
, 1285 programs in total. The benchmark configuration can be found at reverseCFA.xml. All benchmarks below were run with revision reverse-cfa:44884
The below benchmarks utilized the Bounded Model Checking(BMC), whose configuration can be accessed via bmc.properties.
BMC (reverse CFA) | BMC | |
---|---|---|
Correct True | 53 | 34 |
Correct False | 16 | 60 |
Incorrect True | 2 | 0 |
Incorrect False | 1 | 0 |
Score | 42 | 128 |
The entire benchmark outcome can be accessed via Result 1.
The below benchmarks utilized the Adjustable-Block Encoding CPA for predicate analysis with CEGAR as described in "Predicate Abstraction with Adjustable-Block Encoding"
Predicate Analysis (reverse CFA) | Predicate Analysis | |
---|---|---|
Correct True | 140 | 213 |
Correct False | 30 | 137 |
Incorrect True | 4 | 0 |
Incorrect False | 5 | 12 |
Score | 102 | 371 |
The entire benchmark outcome can be accessed via Result 2.
The below benchmarks utilized the plain k-Induction analysis.
k-Induction (reverse CFA) | k-Induction | |
---|---|---|
Correct True | 94 | 224 |
Correct False | 30 | 206 |
Incorrect True | 3 | 0 |
Incorrect False | 6 | 34 |
Score | 26 | 110 |
The entire benchmark outcome can be accessed via Result 3.
Benchmark results show that using CFA Reverser provides comparable speed and memory usage to the original analysis for correct cases. The main reason why some Incorrect True
s occur in the benchmark is due to the multi-target issue mentioned above. This kind of program should be rejected by the CFA Reverser or should be resolved by inline function expansion, which is the future work.
It has been observed that the BMC algorithm with reverse CFA, does not perform well when dealing with loop programs. Loop programs are reversed as infinite loops, which could lead to a TIMEOUT
when working with the BMC algorithm. While reverse CFA may decrease the distance between the entry and target points, the use of infinite in loops causes significant overhead. In summary, while this approach may have benefits, it is important to consider its overheads and limitations.
Currently, there are still some unsupported features and unresolved issues. Specifically:
TIMEOUT
. However, it may be possible to reduce the number of assumptions made about array elements since only a subset of indices is used in most programs.