Edit me

Datalog is a declarative programming language that was introduced as a query language for deductive databases in the late 70s. Datalog uses first-order predicate logic to express computations in the form of Horn clauses. Datalog adapts the syntax style of Prolog. A full exposition is beyond the scope of this manual – more details are available in the book Foundations of Databases by Abiteboul, Hull and Vianu, or tutorial material.

Language

The main language elements in Soufflé are relation declarations, facts, rules, and directives. For example, the following program,

.decl A(x:number, y:number)  // declaration of relation A
A(1,2).                      // facts of relation A
A(2,3).

.decl B(x:number, y:number)  // declaration of relation B
B(x,y) :- A(x,y).            // rules of relation B
B(x,z) :- A(x,y), B(y,z).

.output B                    // Output relation B 

has two relations A and B. Relations must be declare so that the use of attributes can be checked at compiletime. In the example, relation A has two facts: A(1,2). and A(2,3). A fact is a rule that holds unconditionally, i.e., a fact is a Horn Clause A(1,2) ⇐ true. Relation B has two rules, i.e., B(x,y) :- A(x,y). and B(x,y) :- A(x,y), B(y,z). representing the Horn clause B(x,y) ⇐ A(x,y) and B(x,y) ⇐ A(x,y), B(y,z).

The directive .output B queries the relation B at the end of the execution and writes its result either into a file or prints it on the screen. The programmer can choose the order of the relation declarations, facts, rules, and directives in the source-code arbitrarly.

Relations

Soufflé requires the declaration of relations. A relation is a set of ordered tuples (x1, …, xk) where each element xi is a member of a data domain defined by a type. In the previous example, the declaration

.decl A(x:number, y:number).

defines the relation A that contains pairs of numbers only. The first attribute is named x and the second attribute is named y. Attributes have a type. In the above example, the type of attribute x and y is a number.

The type-checker of Soufflé will infer the type of variables in rules and check their correct use.

Types

Soufflé utilises a typed Datalog dialect to conduct static checks enabling the early detection of errors in Datalog query specifications. Each attribute of involved relations has to be typed. Based on those, Soufflé attempts to deduce a type for all terms within all the Horn clauses within a given Datalog program. In case no type can be deduced for some terms, a typing error is reported – indicating a likely inconsistency in the query specification.

There are four primitive types in Soufflé, i.e., symbol, number, unsigned, and float.

Rules

Rules are conditional logic statements. A rule starts with a head followed by a body. For example,

A(x,y) :- B(x,y).

holds for a pair (x,y) if it is in B.

The performance of rules can be automatically improved by using Soufflé’s auto-scheduler.

Note that to achieve optimal performance of rules, hand-tuning may be required.

Components

Soufflé has components to modularise large logic programs. A component may contain other components, relation and type declarations, facts, rules, and directives. A programmer can declare and instantiate components. Each component has its own name space to access its elements. Components can have one or more super-components from which they can inherit.

User-Defined Functors

Programmers can declare user-defined functors for extending Soufflé. User-defined functors are implemented in C/C++. There are two flavours of the user-defined functors, i.e., naive and stateful functors. Stateful functors expose record and symbol tables.

Pragma

Pragmas configure Soufflé. For example, command-line options can be set in the source code.

Syntax

Comments

Soufflé utilises the same comment syntax as C/C++. Inline comments starts with // anywhere outside of a comment and extend to the end of the line. Block comments starts with /* anywhere outside of a comment and extend to the first occurence of */.

// an inline comment

/* this is
 * a comment block
 */

Identifiers

Soufflé Identifiers follow the C++ naming convention, except that a question mark may appear anywhere.

  • The identifier can only be composed of letters (lower or upper case), numbers, or the question mark and underscore characters. That means the name cannot contain whitespace, or any symbols other than underscores or question marks.
  • The identifier must begin with a letter (lower or upper case), an underscore, or a question mark. It can not start with a number.

Syntax with the C pre-processor

By default all souffle programs are passed through the C pre-processor. As a consequence, e.g. #include pragmas may be utilized to organize Datalog input queries into several files and reuse common constructs within several programs. Also, constants may be defined utilizing the #define pragma.

Soufflé selects a C pre-processor based on the following decision process:

  • do not use any C pre-processor if command-line argument --no-preprocessor is set
  • otherwise, use the user-provided COMMAND if command-line argument --preprocessor COMMAND is set
  • otherwise, use mcpp if it is available
  • otherwise, use gcc if it is available
  • otherwise, raise an error.

Syntax without C pre-processor

The C pre-processor is disabled with Soufflé option --no-preprocessor.

The Soufflé syntax allows to include files using the .include "path/to/file.dl" directive that has a similar behavior as #include "path/to/file.dl". The .once directive instructs the scanner to skip the rest of the current file the next time it encounters this directive at the same location. The special tokens __FILE__ and __LINE__ are available and replaced respectively by the current reported file and line number.

The .include directive is available reguardless of the --no-preprocessor option. However when the C pre-processor is active, files included by .include will not be pre-processed.

Program

In the following, we define a program more formally using syntax diagrams and EBNF. The syntax diagrams were produced with Bottlecaps.

A program consists of type declarations, relation declarations, facts, rules, component declarations and instantiations, user-defined functor declarations, and pragmas.

Program

program  ::= 
 ( pragma | 
   functor_decl | 
   component_decl | 
   component_init | 
   directive | 
   rule | 
   fact | 
   relation_decl | 
   type_decl )*