This is the A Specialized B-Tree for Concurrent Datalog Evaluation. It was published in the Symposium on Principles and Practice of Parallel Programming, February 16–20, 2019, Washington, DC, USA (PPoPP’19).

Abstract

Modern Datalog engines are employed in industrial applications such as graph-databases, networks, and static program analysis. To cope with vast amount of data, Datalog engines must employ parallel execution strategies, for which specialized concurrent data structures are of paramount importance.

In this paper, we introduce a specialized B-tree data structure for an open-source Datalog compiler written in C++. Our data structure has been specialized for Datalog workloads running on shared-memory multi-core computers. It features (1) an optimistic locking protocol for scalability, (2) is highly tuned, and (3) uses the notion of ``hints’’ to re-use the results of previously performed tree traversals to exploit data ordering properties exhibited by Datalog evaluation. In parallel micro-benchmarks, the new data structure achieves up to 59x higher performance than state-of-the-art industrial standards, while integrated into a Datalog engine it accounts for 3x higher, overall system performance.

Slides