Haskell Interface for Souffle
The list of languages that can interface with Soufflé keeps growing. Haskell, a high level functional programming language, can now also easily interact with Soufflé via the souffle-haskell library.
Motivating example
To show how easy it is to call Datalog from Haskell, let’s write a program that can check if one point in a graph is reachable from another:
.decl edge(n: symbol, m: symbol)
.decl reachable(n: symbol, m: symbol)
.input edge
.output reachable
reachable(x, y) :- edge(x, y).
reachable(x, z) :- edge(x, y), reachable(y, z).
Now that we have the datalog code, we can write the Haskell code that binds to it. The Haskell code is heavily annotated with comments to explain how it works.
-- Enable some language extensions needed for souffle-haskell:
{-# LANGUAGE ScopedTypeVariables, DataKinds, TypeFamilies, DeriveGeneric #-}
module Main ( main ) where
import qualified Language.Souffle.Interpreted as Souffle
import Data.Foldable ( traverse_ )
import Control.Monad.IO.Class
import GHC.Generics
-- Facts are represented in Haskell as simple product types (a.k.a. records),
-- Datalog numbers are converted to Int32, symbols to Strings / Text.
data Edge = Edge String String
deriving (Eq, Show, Generic)
data Reachable = Reachable String String
deriving (Eq, Show, Generic)
-- We define a data type representing the "handle" to our datalog program.
data Path = Path
-- By making Path an instance of Program, we provide Haskell with information
-- about the datalog program. It uses this to perform compile-time checks to
-- limit the amount of possible programmer errors to a minimum.
instance Souffle.Program Path where
type ProgramFacts Path = [Edge, Reachable]
programName = const "path"
-- By making a data type an instance of Fact, we give Haskell the
-- necessary information to bind to the corresponding datalog fact.
instance Souffle.Fact Edge where
factName = const "edge"
instance Souffle.Fact Reachable where
factName = const "reachable"
-- For simple product types, we can automatically generate the
-- marshalling/unmarshalling code of data between Haskell and datalog.
instance Souffle.Marshal Edge
instance Souffle.Marshal Reachable
-- Now the actual code that will be executed:
main :: IO ()
main = Souffle.runSouffle $ do
-- First we initialize the Soufflé program. This can possibly fail
-- (if the datalog code or souffle itself is not found for example)
-- but Haskell's typesystem forces us to handle this error case
-- by checking if a valid value is returned.
maybeProgram <- Souffle.init Path
case maybeProgram of
Nothing -> liftIO $ putStrLn "Failed to load program."
Just prog -> do -- successfully loaded
-- Adding facts:
Souffle.addFact prog $ Edge "a" "b" -- adding a single fact from Haskell
Souffle.addFacts prog [ Edge "b" "c" -- adding multiple facts at once
, Edge "b" "d"
, Edge "d" "e"
, Edge "e" "f"
, Edge "d" "h"
]
-- "Runs" the Soufflé program, this will compute all other derived facts.
Souffle.run prog
-- Querying of facts:
-- NOTE: Here we require an explicit type annotation since we directly
-- print it to stdout and would be ambiguous to the typechecker,
-- but if result is passed to another function for example,
-- it can infer the correct type automatically.
-- getFacts changes behavior and returns different facts
-- based on the type of the Fact it needs to return!
results :: [Reachable] <- Souffle.getFacts prog
liftIO $ traverse_ print results
-- We can also look for a single specific fact (this can be more efficient
-- than querying all facts and then checking if the fact is present):
maybeFact <- Souffle.findFact prog $ Reachable "a" "c"
liftIO $ print $ maybeFact
Features
Small API
The example above shows that with a small high level API, we can interface with Soufflé. The library takes care of all the low level details, allowing developers to focus on solving problems using both Haskell and Soufflé.
Type safety
The bindings makes heavy use of Haskell’s amazing typesystem to encode the structure of the datalog program on the type level. This gives the compiler enough information to avoid an entire class of bugs that would need to be manually checked (each time functions are invoked) in other languages.
An example of this is the addFact
function:
addFact :: (Fact a, ContainsFact prog a) => Handle prog -> a -> m ()
If we look at the type signature, it reveals that it can only add a type that is indeed a fact and that the fact should be in the list of facts that are known to the datalog program. (In the previous example, the “Path” program only contained the “Edge” and “Reachable” facts.)
If a fact does not belong to the program handle, we get a type error, avoiding a situation where an unknown fact would be marshalled to the Soufflé side. Given the following Haskell code:
data UnknownFact = UnknownFact String
deriving Generic
instance Souffle.Marshal UnknownFact
instance Souffle.Fact UnknownFact where
factName = const "unknown"
example :: Souffle.Handle Path -> Souffle.SouffleM ()
example handle =
Souffle.addFact handle $ UnknownFact 42
This will raise the following type error when trying to compile the Haskell module:
Example.hs:33:3: error:
• You tried to perform an action with a fact of type 'UnknownFact' for program 'Path'.
The program contains the following facts: '[Edge, Reachable].
It does not contain fact: UnknownFact.
You can fix this error by adding the type 'UnknownFact' to the ProgramFacts type
in the Program instance for Path.
• In the expression: Souffle.addFact handle
In the expression: Souffle.addFact handle $ UnknownFact 42
In an equation for ‘Example.example’:
Example.example handle
= Souffle.addFact handle $ UnknownFact 42
|
33 | Souffle.addFact handle $ UnknownFact 42
| ^^^^^^^^^^^^^^^^^^^^^^
Documentation
Haskell libraries sometimes have less good or little documentation compared to other languages. However, for this library, having bad documentation is considered a bug and the text will be updated if it is found to be unclear or incomplete. All public functions are documented in their behavior.
The full documentation can be found on Hackage.
Development process
Soufflé programs can be run in 2 ways. They can either run in interpreted
mode (which uses the souffle
CLI command under the hood), or they can be
compiled to C++-code beforehand and called from a host Haskell program
for improved efficiency in the final executable. This library supports both
modes (since version 0.2.0). The two variants have only a few minor
differences and can be swapped fairly easily, mostly by swapping out
Language.Souffle.Interpreted
with Language.Souffle.Compiled
or vice versa
in the imports of your code.
Interpreted mode
This is probably the mode you want to start out with if you are developing a program that uses Datalog for computing certain facts or relations. Interpreted mode offers quick development iterations (no compiling of C++ code each time you change your Datalog code). However because the Soufflé code is interpreted, it can’t offer the same speed as in compiled mode.
Compiled mode
Once the prototyping phase of the Datalog algorithm is over, it is advised to switch over to the compiled mode. It offers much improved performance compared to the interpreted mode, at the cost of having to recompile your Datalog algorithm each time it changes.
Wrapping up
souffle-haskell is already being used in 2 projects (that I know of): GRIN, a WIP compiler backend for functional languages and GHC-WPC, a WIP whole program compiler for Haskell. Both repositories use Soufflé for analysis during program optimizations of input programs. I would like to thank the developers for giving me feedback, and helping with development of part of the library (the interpreted mode would not be there yet without them!).
The library is mostly finished now, but some features are still planned such as making it easier to write a Haskell library that uses generated Soufflé code completely standalone (without needing the C++ headers installed in code that makes use of the library!), or adding an even higher level DSL to keep the Haskell and Soufflé code in sync with each other. If you have any ideas or possible improvements for the library, you can submit an issue on Github or contact me on Twitter.