Haskell Interface for Souffle | Soufflé • A Datalog Synthesis Tool for Static Analysis

The list of languages that can interface with Soufflé keeps growing. Haskell, a high level functional programming language, can now also easily interact with Soufflé via the souffle-haskell library.

Motivating example

To show how easy it is to call Datalog from Haskell, let’s write a program that can check if one point in a graph is reachable from another:

.decl edge(n: symbol, m: symbol)
.decl reachable(n: symbol, m: symbol)

.input edge
.output reachable

reachable(x, y) :- edge(x, y).
reachable(x, z) :- edge(x, y), reachable(y, z).

Now that we have the datalog code, we can write the Haskell code that binds to it. The Haskell code is heavily annotated with comments to explain how it works.

-- Enable some language extensions needed for souffle-haskell:
{-# LANGUAGE ScopedTypeVariables, DataKinds, TypeFamilies, DeriveGeneric #-}

module Main ( main ) where

import qualified Language.Souffle.Interpreted as Souffle
import Data.Foldable ( traverse_ )
import Control.Monad.IO.Class
import GHC.Generics


-- Facts are represented in Haskell as simple product types (a.k.a. records),
-- Datalog numbers are converted to Int32, symbols to Strings / Text.

data Edge = Edge String String
  deriving (Eq, Show, Generic)

data Reachable = Reachable String String
  deriving (Eq, Show, Generic)

-- We define a data type representing the "handle" to our datalog program.
data Path = Path

-- By making Path an instance of Program, we provide Haskell with information
-- about the datalog program. It uses this to perform compile-time checks to
-- limit the amount of possible programmer errors to a minimum.
instance Souffle.Program Path where
  type ProgramFacts Path = [Edge, Reachable]
  programName = const "path"

-- By making a data type an instance of Fact, we give Haskell the
-- necessary information to bind to the corresponding datalog fact.
instance Souffle.Fact Edge where
  factName = const "edge"

instance Souffle.Fact Reachable where
  factName = const "reachable"

-- For simple product types, we can automatically generate the
-- marshalling/unmarshalling code of data between Haskell and datalog.
instance Souffle.Marshal Edge
instance Souffle.Marshal Reachable


-- Now the actual code that will be executed:
main :: IO ()
main = Souffle.runSouffle $ do
  -- First we initialize the Soufflé program. This can possibly fail
  -- (if the datalog code or souffle itself is not found for example)
  -- but Haskell's typesystem forces us to handle this error case
  -- by checking if a valid value is returned.
  maybeProgram <- Souffle.init Path
  case maybeProgram of
    Nothing -> liftIO $ putStrLn "Failed to load program."
    Just prog -> do  -- successfully loaded
      -- Adding facts:
      Souffle.addFact prog $ Edge "a" "b"   -- adding a single fact from Haskell
      Souffle.addFacts prog [ Edge "b" "c"  -- adding multiple facts at once
                            , Edge "b" "d"
                            , Edge "d" "e"
                            , Edge "e" "f"
                            , Edge "d" "h"
                            ]

      -- "Runs" the Soufflé program, this will compute all other derived facts.
      Souffle.run prog

      -- Querying of facts:
      -- NOTE: Here we require an explicit type annotation since we directly
      --       print it to stdout and would be ambiguous to the typechecker,
      --       but if result is passed to another function for example,
      --       it can infer the correct type automatically.
      --       getFacts changes behavior and returns different facts
      --       based on the type of the Fact it needs to return!
      results :: [Reachable] <- Souffle.getFacts prog
      liftIO $ traverse_ print results

      -- We can also look for a single specific fact (this can be more efficient
      -- than querying all facts and then checking if the fact is present):
      maybeFact <- Souffle.findFact prog $ Reachable "a" "c"
      liftIO $ print $ maybeFact

Features

Small API

The example above shows that with a small high level API, we can interface with Soufflé. The library takes care of all the low level details, allowing developers to focus on solving problems using both Haskell and Soufflé.

Type safety

The bindings makes heavy use of Haskell’s amazing typesystem to encode the structure of the datalog program on the type level. This gives the compiler enough information to avoid an entire class of bugs that would need to be manually checked (each time functions are invoked) in other languages.

An example of this is the addFact function:

addFact :: (Fact a, ContainsFact prog a) => Handle prog -> a -> m ()

If we look at the type signature, it reveals that it can only add a type that is indeed a fact and that the fact should be in the list of facts that are known to the datalog program. (In the previous example, the “Path” program only contained the “Edge” and “Reachable” facts.)

If a fact does not belong to the program handle, we get a type error, avoiding a situation where an unknown fact would be marshalled to the Soufflé side. Given the following Haskell code:

data UnknownFact = UnknownFact String
  deriving Generic

instance Souffle.Marshal UnknownFact

instance Souffle.Fact UnknownFact where
  factName = const "unknown"

example :: Souffle.Handle Path -> Souffle.SouffleM ()
example handle =
  Souffle.addFact handle $ UnknownFact 42

This will raise the following type error when trying to compile the Haskell module:

Example.hs:33:3: error:
    • You tried to perform an action with a fact of type 'UnknownFact' for program 'Path'.
      The program contains the following facts: '[Edge, Reachable].
      It does not contain fact: UnknownFact.
      You can fix this error by adding the type 'UnknownFact' to the ProgramFacts type
 in the Program instance for Path.
    • In the expression: Souffle.addFact handle
      In the expression: Souffle.addFact handle $ UnknownFact 42
      In an equation for ‘Example.example’:
          Example.example handle
            = Souffle.addFact handle $ UnknownFact 42
   |
33 |   Souffle.addFact handle $ UnknownFact 42
   |   ^^^^^^^^^^^^^^^^^^^^^^

Documentation

Haskell libraries sometimes have less good or little documentation compared to other languages. However, for this library, having bad documentation is considered a bug and the text will be updated if it is found to be unclear or incomplete. All public functions are documented in their behavior.

The full documentation can be found on Hackage.

Development process

Soufflé programs can be run in 2 ways. They can either run in interpreted mode (which uses the souffle CLI command under the hood), or they can be compiled to C++-code beforehand and called from a host Haskell program for improved efficiency in the final executable. This library supports both modes (since version 0.2.0). The two variants have only a few minor differences and can be swapped fairly easily, mostly by swapping out Language.Souffle.Interpreted with Language.Souffle.Compiled or vice versa in the imports of your code.

Interpreted mode

This is probably the mode you want to start out with if you are developing a program that uses Datalog for computing certain facts or relations. Interpreted mode offers quick development iterations (no compiling of C++ code each time you change your Datalog code). However because the Soufflé code is interpreted, it can’t offer the same speed as in compiled mode.

Compiled mode

Once the prototyping phase of the Datalog algorithm is over, it is advised to switch over to the compiled mode. It offers much improved performance compared to the interpreted mode, at the cost of having to recompile your Datalog algorithm each time it changes.

Wrapping up

souffle-haskell is already being used in 2 projects (that I know of): GRIN, a WIP compiler backend for functional languages and GHC-WPC, a WIP whole program compiler for Haskell. Both repositories use Soufflé for analysis during program optimizations of input programs. I would like to thank the developers for giving me feedback, and helping with development of part of the library (the interpreted mode would not be there yet without them!).

The library is mostly finished now, but some features are still planned such as making it easier to write a Haskell library that uses generated Soufflé code completely standalone (without needing the C++ headers installed in code that makes use of the library!), or adding an even higher level DSL to keep the Haskell and Soufflé code in sync with each other. If you have any ideas or possible improvements for the library, you can submit an issue on Github or contact me on Twitter.