Datalog is a declarative programming language that was introduced as a query language for deductive databases in the late 70s. Datalog uses first-order predicate logic to express computations in the form of Horn clauses. Datalog adapts the syntax style of Prolog. A full exposition is beyond the scope of this manual – more details are available in the book Foundations of Databases by Abiteboul, Hull and Vianu, or tutorial material.
The main language elements in Soufflé are relation declarations, facts, rules, and directives. For example, the following program,
.decl A(x:number, y:number) // declaration of relation A A(1,2). // facts of relation A A(2,3). .decl B(x:number, y:number) // declaration of relation B B(x,y) :- A(x,y). // rules of relation B B(x,z) :- A(x,y), B(y,z). .output B // Output relation B
has two relations
B. Relations must be declare
so that the use of attributes can be checked at compiletime.
In the example, relation
A has two facts:
A fact is a rule that holds unconditionally, i.e., a fact is a Horn Clause
A(1,2) ⇐ true.
B has two rules, i.e.,
B(x,y) :- A(x,y). and
B(x,y) :- A(x,y), B(y,z).
representing the Horn clause
B(x,y) ⇐ A(x,y) and
B(x,y) ⇐ A(x,y), B(y,z).
.output B queries the relation
B at the end of the execution and writes
its result either into a file or prints it on the screen. The programmer can choose the order of
the relation declarations, facts, rules, and directives in the source-code arbitrarly.
Soufflé requires the declaration of relations. A relation is a set of ordered tuples (x1, …, xk) where each element xi is a member of a data domain defined by a type. In the previous example, the declaration
.decl A(x:number, y:number).
defines the relation
A that contains pairs of numbers only.
The first attribute is named
x and the second attribute is
y. Attributes have a type. In the above example,
the type of attribute
y is a number.
The type-checker of Soufflé will infer the type of variables in rules and check their correct use.
Soufflé utilises a typed Datalog dialect to conduct static checks enabling the early detection of errors in Datalog query specifications. Each attribute of involved relations has to be typed. Based on those, Soufflé attempts to deduce a type for all terms within all the Horn clauses within a given Datalog program. In case no type can be deduced for some terms, a typing error is reported – indicating a likely inconsistency in the query specification.
There are four primitive types in Soufflé, i.e.,
Rules are conditional logic statements. A rule starts with a head followed by a body. For example,
A(x,y) :- B(x,y).
holds for a pair (x,y) if it is in B.
The performance of rules can be automatically improved by using Soufflé’s auto-scheduler.
Note that to achieve optimal performance of rules, hand-tuning may be required.
Soufflé has components to modularise large logic programs. A component may contain other components, relation and type declarations, facts, rules, and directives. A programmer can declare and instantiate components. Each component has its own name space to access its elements. Components can have one or more super-components from which they can inherit.
Programmers can declare user-defined functors for extending Soufflé. User-defined functors are implemented in C/C++. There are two flavours of the user-defined functors, i.e., naive and stateful functors. Stateful functors expose record and symbol tables.
Pragmas configure Soufflé. For example, command-line options can be set in the source code.
Soufflé utilises the same comment syntax as C/C++. Inline comments starts with
// anywhere outside of a comment and extend to the end of the line. Block comments starts with
/* anywhere outside of a comment and extend to the first occurence of
// an inline comment /* this is * a comment block */
Soufflé Identifiers follow the C++ naming convention, except that a question mark may appear anywhere.
- The identifier can only be composed of letters (lower or upper case), numbers, or the question mark and underscore characters. That means the name cannot contain whitespace, or any symbols other than underscores or question marks.
- The identifier must begin with a letter (lower or upper case), an underscore, or a question mark. It can not start with a number.
Syntax with the C pre-processor
By default all souffle programs are passed through the C pre-processor. As a consequence, e.g.
#include pragmas may be utilized to organize Datalog input queries into several files and reuse common constructs within several programs. Also, constants may be defined utilizing the
Soufflé selects a C pre-processor based on the following decision process:
- do not use any C pre-processor if command-line argument
- otherwise, use the user-provided
COMMANDif command-line argument
--preprocessor COMMANDis set
- otherwise, use
mcppif it is available
- otherwise, use
gccif it is available
- otherwise, raise an error.
Syntax without C pre-processor
The C pre-processor is disabled with Soufflé option
The Soufflé syntax allows to include files using the
.include "path/to/file.dl" directive that has a similar behavior as
#include "path/to/file.dl". The
.once directive instructs the scanner to skip the rest of the current file the next time it encounters this directive at the same location. The special tokens
__LINE__ are available and replaced respectively by the current reported file and line number.
.include directive is available reguardless of the
--no-preprocessor option. However when the C pre-processor is active, files included by
.include will not be pre-processed.
.include directive looks for source files first relative to the directory of the current file, and then relative to each include directory set with
-I command-line option.
When setting a non-absolute include directory with
-I relative/path, it is interpreted relative to the current working directory.
In the following, we define a program more formally using syntax diagrams and EBNF. The syntax diagrams were produced with Bottlecaps.
A program consists of type declarations, relation declarations, facts, rules, component declarations and instantiations, user-defined functor declarations, and pragmas.
program ::= ( pragma | functor_decl | component_decl | component_init | directive | rule | fact | relation_decl | type_decl )*