PaPoC '24: Proceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data

Full Citation in the ACM Digital Library

Undo and Redo Support for Replicated Registers

Leo Stewen
Martin Kleppmann

Undo and redo functionality is ubiquitous in collaboration software. In single user settings, undo and redo are well understood. However, when multiple users edit a document, concurrency may arise, leading to a non-linear operation history. This renders undo and redo more complex both in terms of their semantics and implementation.

We survey the undo and redo semantics of current mainstream collaboration software and derive principles for undo and redo behavior in a collaborative setting. We then apply these principles to a simple CRDT, the Multi-Valued Replicated Register, and present a novel undo and redo algorithm that implements the undo and redo semantics that we believe are most consistent with users' expectations.

Extending JSON CRDTs with Move Operations

Liangrun Da
Martin Kleppmann

Conflict-Free Replicated Data Types (CRDTs) for JSON allow users to concurrently update a JSON document and automatically merge the updates into a consistent state. Moving a subtree in a map or reordering elements in a list within a JSON CRDT is challenging: naive merge algorithms may introduce unexpected results such as duplicates or cycles. In this paper, we introduce an algorithm for move operations in a JSON CRDT that handles the interaction with concurrent non-move operations, and uses novel optimisations to improve performance. We plan to integrate this algorithm into the Automerge CRDT library.

Wrapping Rings in Lattices: An Algebraic Symbiosis of Incremental View Maintenance and Eventual Consistency

Conor Power
Saikrishna Achalla
Ryan Cottone
Nathaniel Macasaet
Joseph M. Hellerstein

We reconcile the use of semi-lattices in CRDTs and the use of groups and rings in incremental view maintenance to construct systems with strong eventual consistency, incremental computation, and database query optimization.

Simplifying Snapshot Isolation: A New Definition, Equivalence, and Efficient Checking

Jian Zhang
Cheng Tan

Snapshot Isolation (SI) is a popular isolation level, supported by many databases and is widely used by applications. Understanding and checking SI is essential. However, today's SI definitions can be obscure for non-experts to understand, or inefficient to verify, or dependent on specific implementations. In contrast, our goal is to offer a definition that is straightforward and easy to comprehend, enables efficient verification, and remains independent of the underlying implementation. In this paper, we introduce such an SI definition using a data structure called BC-graphs. We prove that our SI definition is equivalent to Adya SI [7], the de facto SI definition. We did an empirical study to show that our SI definition accelerates SI checking significantly compared to checking Adya SI.

Bigger, not Badder: Safely Scaling BFT Protocols

David C. Y. Chu
Chris Liu
Natacha Crooks
Joseph M. Hellerstein
Heidi Howard

Byzantine Fault Tolerant (BFT) protocols provide powerful guarantees in the presence of arbitrary machine failures, yet they do not scale. The process of creating new, scalable BFT protocols requires expert analysis and is often error-prone. Recent work suggests that localized, rule-driven rewrites can be mechanically applied to scale existing (non-BFT) protocols, including Paxos. We modify these rewrites--- decoupling and partitioning---so they can be safely applied to BFT protocols, and apply these rewrites to the critical path of PBFT, improving its throughput by 5×. We prove the correctness of the modified rewrites on any BFT protocol by formally modeling the arbitrary logic of a Byzantine node. We define the Borgesian simulator, a theoretical node that simulates a Byzantine node through randomness, and show that in any BFT protocol, the messages that a Borgesian simulator can generate before and after optimization is the same. Our initial results point the way towards an automatic optimizer for BFT protocols.

Logical Clocks and Monotonicity for Byzantine-Tolerant Replicated Data Types

Florian Jacob
Hannes Hartenstein

Replicated event logbooks are ubiquitous in decentralized systems designed to cope with Byzantine-faulty replicas. Recently, there is a growing subclass that only partially orders its logbooks by hash-linking inscribed events to their causal past. Thereby, these approaches forgo coordination and consensus to gain scalability and availability under partition. We investigate these approaches to explicate their underlying construction by connecting their design to the concept of logical monotonicity and by providing an abstraction as a delta-state conflict-free replicated data type. In particular, we analyze what makes a clock Byzantine-tolerant, and show that these hash-linked causal logbooks represent Byzantine-tolerant clocks. Based on these insight, we model real-world group communication systems as Byzantine monotonic compositions, and analyze their monotonicity properties to understand the guarantees they provide to the application layer.

Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and Simple

Chris Jensen
Heidi Howard
Antonios Katsarakis
Richard Mortier

Distributed transactional datastores are pivotal in supporting the needs of modern applications and services. Datastores rely on distributed transactional protocols to tolerate faults while also aiming for strong consistency and high performance. State-of-the-art transactional protocols, such as FaRM, provide a lock-free execution phase, thus improving performance and simplifying recovery. However, these protocols often come with lengthy and complicated commit phases, requiring multiple round-trips to commit a transaction (or additional replicas). This completely contrasts with the simplicity and efficiency of the traditional two-phase commit (2PC) protocol, which can commit a transaction after a single round-trip, albeit lacking fault tolerance.

To address the limitations of both approaches, we introduce U2PC, a novel 2PC variant that combines simplicity, efficiency, and fault tolerance. U2PC frugally replicates data (f + 1) times to tolerate up to f faults with strong consistency. It offers a single round-trip commit after unanimous responses of all replicas of the involved shards and ensures safe recovery via an extra "pre-abort" round before aborting a transaction. Our verification in TLA+ confirms that U2PC achieves strict serializability and recovers safely. In short, U2PC ensures fault tolerance, optimizes performance for common scenarios, and offers uniform transaction handling.

Models for Storage in Database Backends

Edgard Schiebelbein
Saalik Hatia
Annette Bieniusa
Carla Ferreira
Gustavo Petri
Marc Shapiro

This paper describes ongoing work on developing a formal specification of a database backend. We present the formalisation of the expected behaviour of a basic transactional system that calls into a simple store API, and instantiate in two semantic models. The first is a map-based, classical versioned key-value store; the second, journal-based, appends individual transaction effects to a journal. We formalise a significant part of the specification in the Coq proof assistant. This work will be the basis for formalising a full-fledged backend store with features such as caching or write-ahead logging as variations on maps and journals.