Fast Parallel Equivalence Relations in a Datalog Compiler
A new paper has been published as part of the 28th International Conference on Parallel Architectures and Compilation techniques (PACT). This is an extension of the work written about previously, and describes the implementation of equivalence relations natively in compiled Soufflé utilising custom data-structures, and its extension to semi-naive evaluation to accommodate the storage of equivalence relation delta relations efficiently.
Modern parallelizing Datalog compilers are employed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and optimal network routes. State-of-the-art Datalog engines represent equivalence relations verbatim by enumerating all possible pairs in an equivalence class. This approach inhibits scalability for large datasets.
In this paper, we introduce EQREL, a specialized parallel union-find data structure for scalable equivalence relations, and its integration into a Datalog compiler. Our data structure provides a quadratic worst-case speed-up and space improvement. We demonstrate the efficacy of our data structure in Soufflé, which is a Datalog compiler that synthesizes parallel C++ code. We use real-world benchmarks and show that the new data structure scales on shared-memory multi-core architectures storing up to a half-billion pairs for a static program analysis scenario.