Soufflé’s type system is static. Types are used to define the attributes of a relation. Static types have the advantage that they can be enforced at compile-time; evaluation speed is improved as no dynamic checks are needed. Another advantage of the type system is that it supports the programmer by enforcing the correct use of relations.
The Type system is based on the idea of sets. A Type can be either the universe of all possible values, or a subset. Subsets can be composed of other subsets.
Soufflé has two primitive types:
- Symbol type:
- Number type:
The symbol type consists of the universe of all strings.
Internally, the symbol type is represented by an ordinal number.
The ordinal number for a symbol can be determined by using the
ord command, e.g., ord(“hello”) represents the ordinal number for “hello”.
The number type consists of the universe of all numbers. The accepted range of numbers is restricted by architecture. Soufflé accepts 32 bit signed numbers.
Primitive type usage
.decl Name(n: symbol) Name("Hans"). Name("Gretl"). .decl Translate(n: symbol , o: number) .output Translate Translate(x,ord(x)) :- Name(x).
Beyond Primitive Types
Primitive types are insufficient for large projects. There is a need for a way to ensure that the wrong attributes will not be bound. Symbols of different types need to be differentiated. The ability to form partial orders over subsets allows the formation of ontologies.
Symbol types for attributes are defined by the
.symbol_type declarative, e.g.,
.symbol_type City .symbol_type Town .symbol_type Village
Here we have defined distinct sets of symbols from the universe of possible symbols.
The Union type unifies a fixed number of symbol set types, of either base or union types.
.type <ident1> = <ident1> | <ident2> | ... | <identk>
.type Place = City | Town | Village
We can bring these together to define attributes that better describe a relation, e.g.,
.symbol_type City .symbol_type Town .symbol_type Village .type Place = City | Town | Village .decl Data(c:City, t:Town, v:Village) Data(“Sydney”, ”Ballina”, “Glenrowan”). .decl Location(p:Place) .output Location Location(p) :- Data(p,_,_); Data(_,p,_); Data(_,_,p).
The semantic of types is defined by their associated domains. Let T be a type and D(T) denote its associated domain. The domains are given by the following rules:
- D(number) = D(uint32_t) (or altered by compiler flag)
- D(symbol) is the set of all strings over the set of printable characters
- D(U) ⊂ D(number) if U is defined by
- D(S) ⊂ D(symbol) if S is defined by
D( T1 T2 … Tn ) ⊇ D(T1) ∪ D(T2) ∪ … ∪ D(Tn)
- D( [ f1 : T1 , f2 : T2 , … , fn : Tn ] ) ⊇ D(T1) x D(T2) x … x D(Tn)
Furthermore, for two primitive types U and S we have that
- D(U) ∩ D(S) = ∅
Thus, by definition, no two primitive user-defined types U and S may have common elements. Consider the following type definitions:
.number_type weight .number_type length
For both types, weight and length, the value 30 is a valid value. However, an expression of value 30 and type
weight is not considered equivalent to an expression of value 30 and type
length. Both values may be considered to be tagged by their type and the type is essential for equivalence.
Consequently, with the Datalog program
.number_type weight .number_type length .decl A(w:weight) .decl B(l:length) A(X) :- B(X).
the trailing rule will cause a typing error since no value of the first attribute of B will ever be in the domain of the first attribute of A, since D(weight) ∩ D(length) = ∅.
This interaction between types is generalized by the sub-type relation to be covered next.
The sub-type relation among types provides a foundation for determining whether there might be values in the intersection between (potentially infinite) domains of types. Intuitively, a type U is a subtype of S, denoted by U <: S, if all elements in the domain of U are also to be found in the domain of S. The following rules define the sub-type relation:
The sub-type relation is reflexive and transitive:
- S <: S for all types S
- if S <: T and T <: U we have S <: U for all types S, T, and U
All primitive types are sub-types of their predefined base type:
- U <: number if U is defined by
- U <: symbol if U is defined by
Thus, every value of a primitive type, e.g. the information that a weight is 40, can ‘decay’ to an ordinary number without the implicit type-tag of being a weight.
Finally, for union types the following rules hold:
U <: T1 … Tn if there is a 1 ≤ i ≤ n such that U <: Ti
T1 … Tn <: U if for all 1 ≤ i ≤ n we have Ti <: U
U1 … Un <: T1 … Tm if for all 1 ≤ i ≤ n there is a 1 ≤ j ≤ m such that Ui <: Tj
The first two rules define the relation between union types and non-union types and the third rule defines the relation between two union types. Note that a union type consisting of a single type is a sub-type of this type as well as the other way around.
There are no sub-typing relations for record types. Each record type constitutes its own, distinct domain. Thus, no element of one domain may be found in the domain of any other type’s domain.
As an example, consider the following code fragment:
.type even = number .type odd = number .decl A(x:even) .decl B(x:odd) A(X) :- B(X)
In this example, the types
odd are defined as union types over the single type
number. Thus, they are effectively equivalent and no typing error will result from the rule stated for relation
However, when the aim of defining the two types
odd was to enforce that those are two disjunctive sets of integer values, the rule stated for
A should trigger at least a warning since no
X should be within both – the
The problem is caused by utilizing a degenerated version of a union for the definition of the new types. A more accurate modeling is shown here:
.number_type even .number_type odd .decl A(x:even) .decl B(x:odd) A(X) :- B(X)
In this version,
odd are defined as two disjunctive sub-types of the number-type, each exhibiting its own domain of values. From this definition Souffle can deduce that the users intention was to define two disjunctive sets and will report an error for the rule in the last line.
The type system is designed to support the developer of a Datalog program in enforcing constraints on the ‘dataflow’ within the resulting program. However, to enable the system to do so, an accurate modeling of the categories of values in form of a type system is required – as has been illustrated by the example above.