Datalog is a declarative programming language that was introduced as a query language for deductive databases in the late 70s. Datalog uses first-order predicate logic to express computations in the form of Horn clauses. Datalog adapts the syntax style of Prolog. A full exposition is beyond the scope of this manual – more details are available in the book Foundations of Databases by Abiteboul, Hull and Vianu, or tutorial material.
Language
The main language elements in Soufflé are relation declarations, facts, rules, and directives. For example, the following program,
.decl A(x:number, y:number) // declaration of relation A
A(1,2). // facts of relation A
A(2,3).
.decl B(x:number, y:number) // declaration of relation B
B(x,y) :- A(x,y). // rules of relation B
B(x,z) :- A(x,y), B(y,z).
.output B // Output relation B
has two relations A
and B
. Relations must be declare
so that the use of attributes can be checked at compiletime.
In the example, relation A
has two facts: A(1,2).
and A(2,3)
.
A fact is a rule that holds unconditionally, i.e., a fact is a Horn Clause A(1,2) ⇐ true
.
Relation B
has two rules, i.e., B(x,y) :- A(x,y).
and B(x,y) :- A(x,y), B(y,z).
representing the Horn clause B(x,y) ⇐ A(x,y)
and B(x,y) ⇐ A(x,y), B(y,z)
.
The directive .output B
queries the relation B
at the end of the execution and writes
its result either into a file or prints it on the screen. The programmer can choose the order of
the relation declarations, facts, rules, and directives in the source-code arbitrarly.
Relations
Soufflé requires the declaration of relations. A relation is a set of ordered tuples (x1, …, xk) where each element xi is a member of a data domain defined by a type. In the previous example, the declaration
.decl A(x:number, y:number).
defines the relation A
that contains pairs of numbers only.
The first attribute is named x
and the second attribute is
named y
. Attributes have a type. In the above example,
the type of attribute x
and y
is a number.
The type-checker of Soufflé will infer the type of variables in rules and check their correct use.
Types
Soufflé utilises a typed Datalog dialect to conduct static checks enabling the early detection of errors in Datalog query specifications. Each attribute of involved relations has to be typed. Based on those, Soufflé attempts to deduce a type for all terms within all the Horn clauses within a given Datalog program. In case no type can be deduced for some terms, a typing error is reported – indicating a likely inconsistency in the query specification.
There are four primitive types in Soufflé, i.e., symbol
, number
, unsigned
, and float
.
Rules
Rules are conditional logic statements. A rule starts with a head followed by a body. For example,
A(x,y) :- B(x,y).
holds for a pair (x,y) if it is in B.
The performance of rules can be automatically improved by using Soufflé’s auto-scheduler.
Note that to achieve optimal performance of rules, hand-tuning may be required.
Components
Soufflé has components to modularise large logic programs. A component may contain other components, relation and type declarations, facts, rules, and directives. A programmer can declare and instantiate components. Each component has its own name space to access its elements. Components can have one or more super-components from which they can inherit.
User-Defined Functors
Programmers can declare user-defined functors for extending Soufflé. User-defined functors are implemented in C/C++. There are two flavours of the user-defined functors, i.e., naive and stateful functors. Stateful functors expose record and symbol tables.
Pragma
Pragmas configure Soufflé. For example, command-line options can be set in the source code.
Syntax
Comments
Soufflé utilises the same comment syntax as C/C++. Inline comments starts with
//
anywhere outside of a comment and extend to the end of the line. Block
comments starts with /*
anywhere outside of a comment and extend to the first
occurence of */
.
// an inline comment
/* this is
* a comment block
*/
Documentation comments & annotations
Documentation comments and annotations are part of the syntax.
The design is close to Rust’s attributes. A documentation comment is a syntactic sugar for a doc
annotation.
For an optimal usage of documentation comments and annotations, disable the C preprocessor with --no-preprocessor
.
All program’s and component’s items can have outer annotations of the form @[annotation]
or /// doc comment
. Attributes and ADT branches also accept outer annotations.
Some items can have inner annotations of the form @![annotation]
or //! doc comment
:
- after
{
of a component, before the first component’s item. - after
{
of an ADT branch, before the first attribute of the branch. - after
:-
of a clause, before the first constraint.
Annotations and doc comments should not appear before .include
or before any }
or before the end of a file.
/// The documentation comment
/// for relation `R`
.decl R()
/// doc comment for `C`
@[doc = "more outer documentation"]
.comp C {
//! inner doc comment
//! for `C`
@![doc = "more inner documentation"]
}
Identifiers
Soufflé Identifiers follow the C++ naming convention, except that a question mark may appear anywhere.
- The identifier can only be composed of letters (lower or upper case), numbers, or the question mark and underscore characters. That means the name cannot contain whitespace, or any symbols other than underscores or question marks.
- The identifier must begin with a letter (lower or upper case), an underscore, or a question mark. It can not start with a number.
Syntax with the C pre-processor
By default all souffle programs are passed through the C pre-processor. As a consequence, e.g. #include
pragmas may be utilized to organize Datalog input queries into several files and reuse common constructs within several programs. Also, constants may be defined utilizing the #define
pragma.
Soufflé selects a C pre-processor based on the following decision process:
- do not use any C pre-processor if command-line argument
--no-preprocessor
is set - otherwise, use the user-provided
COMMAND
if command-line argument--preprocessor COMMAND
is set - otherwise, use
mcpp
if it is available - otherwise, use
gcc
if it is available - otherwise, raise an error.
Syntax without C pre-processor
The C pre-processor is disabled with Soufflé option --no-preprocessor
.
The Soufflé syntax allows to include files using the .include "path/to/file.dl"
directive that has a similar behavior as #include "path/to/file.dl"
. The .once
directive instructs the scanner to skip the rest of the current file the next time it encounters this directive at the same location. The special tokens __FILE__
and __LINE__
are available and replaced respectively by the current reported file and line number.
The .include
directive is available reguardless of the --no-preprocessor
option. However when the C pre-processor is active, files included by .include
will not be pre-processed.
The .include
directive looks for source files first relative to the directory of the current file, and then relative to each include directory set with -I
command-line option.
When setting a non-absolute include directory with -I relative/path
, it is interpreted relative to the current working directory.
Program
In the following, we define a program more formally using syntax diagrams and EBNF. The syntax diagrams were produced with Bottlecaps.
A program consists of type declarations, relation declarations, facts, rules, component declarations and instantiations, user-defined functor declarations, and pragmas.
program ::=
( pragma |
functor_decl |
component_decl |
component_init |
directive |
rule |
fact |
relation_decl |
type_decl )*