This is the Automatic Index Selection for Large-Scale Datalog Computation describing on how Soufflé selects indexes using a bi-partite matching problem. It was published in the 45th International Conference on Very Large Data Bases (VLDB’19).


Datalog has been applied to several use cases that require very high performance on large rulesets and factsets. It is common to create indexes for relations to improve search performance. How- ever, the existing indexing schemes either require manual index se- lection or result in insufficient performance on very large tasks. In this paper, we propose an automatic scheme to select indexes. We automatically create the minimum number of indexes to speed up all the searches in a given Datalog program. We have in- tegrated our indexing scheme into an open-source Datalog en- gine SOUFFLE ́. We obtain performance on a par with what users have accepted from hand-optimized Datalog programs running on state-of-the-art Datalog engines, while we do not require the ef- fort of manual index selection. Extensive experiments on large real Datalog programs demonstrate that our indexing scheme results in considerable speedups (up to 2x) and significantly less memory usage (up to 6x) compared with other automated index selections.