Provenance is a way to explain the execution of a Soufflé program, potentially useful for debugging. These explanations come in the form of a proof tree. In Soufflé, for any tuple, these proof trees are of minimal height for that tuple.
Using Provenance
To enable the provenance system, the Soufflé command line option is -t
- Use
-t none
for provenance transformer without any explain interface - Use
-t explain
for provenance with a command line interface tostdout
- Use
-t explore
for provenance with an interface inncurses
- this allows scrolling around a large proof tree
For example, given a program foo.dl, to enable the stdout
interface for provenance use:
souffle -t explain foo.dl
The explain
interface
To illustrate the interface with an example, assume we have the following simple Soufflé program
.decl edge, path(x:number, y:number)
edge(1, 2).
edge(2, 3).
edge(3, 4).
edge(4, 5).
path(x, y) :- edge(x, y).
path(x, z) :- edge(x, y), path(y, z).
The explain interface consists of a number of commands:
explain path(1, 3)
prints the proof tree for the tuplepath(1, 3)
, showing all input and intermediate tuples required to generate the query tuple:> explain path(1, 3) edge(2, 3) -------(R1) edge(1, 2) path(2, 3) -------------------(R2) path(1, 3)
For usage with strings, enclose values with quotation marks:
explain ancestor("john", "mary")
setdepth 3
sets the maximum height of produced proof trees to be 3. Any parts of the proof tree above this height are replaced with a subproof label:> setdepth 3 Depth is now 3 > explain path(1, 4) edge(2, 3) subproof path(0) ------------------------(R2) edge(1, 2) path(2, 4) ------------------------------------(R2) path(1, 4)
This is particularly useful in the case of large programs, where a full derivation tree is too unwieldy to be understood. The default value for this option is 4.
subproof path(0)
prints the remaining part of the proof tree marked by the subproof label, generated when the depth limit is reached.> subproof path(0) edge(3, 4) -------(R1) path(3, 4)
explainnegation path(1, 6)
starts an interactive process to explain why the tuplepath(1, 6)
does not exist in the result. The user must provide a rule number, and values for any free variables in order for the system to produce a partial proof tree:> explainnegation path(1, 6) 1: path(x,y) :- edge(x,y). 2: path(x,z) :- edge(x,y), path(y,z). Pick a rule number: 2 Pick a value for y: 2 ==== edge(1, 2) ✓ path(2, 6) x ------------------------(R2) path(1,6)
The approach here is required as it is not technically feasible to automatically generate explanations for non-existence, and a bit of user guidance is required.
query
command can be used to check existence of tuples in the result.query path(1, 3)
checks existence of the tuplepath(1, 3)
in the result.> query path(1, 3) true.
Or if the given tuple does not exist in the result such as
path(3, 1)
> query path(3, 1) false. Tuple path(3, 1) does not exist
query
can check existence of multiple tuples at once, use,
to delimit tuples. Evaluation would stop if there is at least one tuple does not exist and output that tuple.> query path(1, 3), path(3, 2), path(1, 10), path(2, 4) false. Tuple path(3, 2) does not exist
query
command can have parameters where the tuples additionally contain variables as opposed to numbers or strings only. Parameterised query provides select and join functionality and print values for variables interactively. If a parameterised query has multiple solutions, use;
to find next solution or.
to break from current query.> query edge(1, x), path(x, y) x = 2, y = 3 ; x = 2, y = 4 ; x = 2, y = 5 .
- Other commands allow to control the output (
output <filename>
,format <json|proof>
), print rules of the program (rule <relation> <rulenumber>
), and exit (exit
,quit
, orq
)
Internals of Provenance
The approach for provenance in Soufflé is a lazy one, where no proof trees are computed until the user queries for them.
However, to answer provenance queries efficiently, the provenance system requires to keep track of some extra information during evaluation.
In particular, for each tuple, the system tracks the rule producing the tuple, and the height of a minimal height proof tree for the tuple (we denote this extra information annotations). For example, if a tuple path(1, 3)
is generated using rule number 2, and we know there is a proof tree of height 4 for the tuple, then internally this is represented as:
path(1, 3, 2, 4)
To compute the height annotation for a tuple, we can take the max
over tuples in the body of the rule, and add 1 (the _
denotes that we don’t care about which rule generates each body tuple):
path(x, z, @rule, max(@height1, @height2) + 1) :-
edge(x, y, _, @height1), path(y, z, _, @height2).
During the proof tree construction phase, the annotations are used to guide a backwards search through all the tuples computed by the program. The result of this backwards search is one level of a proof tree, and applying this recursively, we generate the full proof tree!