Fast Parallel Equivalence Relations in a Datalog Compiler
This is the Fast Parallel Equivalence Relations in a Datalog Compiler. It was published in the International Conference on Parallel Architectures and Compilation Techniques, Sep 23-25, Seattle, WA, USA. The paper describes the integration of a concurrent equivalence relation data-structure in Soufflé that performs the computation of reflexivitiy, symmetry, and transitivity inside the data-structure. This gives enormous speed-ups and reduces memory overheads.
Abstract
Modern parallelizing Datalog compilers are em- ployed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and optimal network routes. State-of-the- art Datalog engines represent equivalence relations verbatim by enumerating all possible pairs in an equivalence class. This approach inhibits scalability for large datasets.
In this paper, we introduce EQREL, a specialized parallel union-find data structure for scalable equivalence relations, and its integration into a Datalog compiler. Our data structure pro- vides a quadratic worst-case speed-up and space improvement. We demonstrate the efficacy of our data structure in SOUFFLE ́, which is a Datalog compiler that synthesizes parallel C++ code. We use real-world benchmarks and show that the new data structure scales on shared-memory multi-core architectures storing up to a half-billion pairs for a static program analysis scenario.