We introduce Pancake, a new language for verifiable, low-level systems programming, especially device drivers. Pancake eschews complex type systems to make the language attractive to systems programmers, while at the same time aiming to ease the formal verification of code. We describe the design of the language and its verified compiler, and examine its usability, performance and current limitations through case studies of device drivers and related systems components for an seL4-based operating system.
Device drivers are components that enable operating systems to interact with devices. Unfortunately, they are the main source of bugs in operating systems, because writing a driver is an intricate and error-prone process that requires extensive knowledge of devices and operating systems. Furthermore, supporting new devices and accommodating kernel revisions require significant development effort. To facilitate the development of device drivers, we present Ghost Writer, an end-to-end toolchain that allows developers to synthesize correct-by-construction device drivers from high-level specifications. Ghost Writer supports control and data plane operations (e.g., handling DMA transactions). It makes synthesis tractable by 1) modeling the device interface as a set of virtual registers that abstract the hardware details and 2) leveraging behavior trees to model operations on virtual registers and synthesize complex operations from simpler ones. Our prototype can synthesize putc for the PL011 UART device and send_packet for the VirtIO network device. We believe that Ghost Writer can be the foundation towards automating the development of correct-by-construction device drivers.
C is the primary programming language used in the Linux kernel. Recently, the Linux developer community oversaw the experimental addition of Rust into the kernel's build system. Networking is one of the areas often mentioned when discussing the adoption of Rust. In networking, both perfect memory management and performance are critical.
In this paper, we present a Rust UDP tunneling driver for Linux, which provides UDP encapsulation between two peers. We use this driver to discuss design considerations of writing Rust networking code for Linux. We then compare the performance of our driver against a similar driver written in C. We find that our Rust driver performs slightly worse on a gigabit link for both latency (+0.1906%, p-value = 1.464e--15) and throughput (--0.00090%, p-value = 6.004e--5). We then discuss potential causes for that loss.
While software applications, programming languages, and hardware have changed, operating systems have not. Widely-used commodity operating systems are still modelled after the ones designed in the seventies. The accumulated burden of backward compatibility with the large software ecosystems that run our workloads prevents systems from embracing more efficient and disruptive designs explored by the system research community.
This paper advocates a fresh approach to operating system research, where innovations are incrementally integrated into operating systems, without disrupting existing software, to gradually reshape our daily-use systems. The dynamic linker emerges as a pivotal element in this transformation process, redefining system behavior. The paper outlines specific use cases, covering performance enhancements, strengthened security measures, streamlined software deployment, and enriched programming language abstractions. Additionally, the paper introduces Spidl, an experimental modular dynamic linker to facilitate the exploration of this promising new research avenue.
Pipes are a standard inter-process communication mechanism in Unix-like operating systems, allowing to compose simple, single-purpose programs to solve complex problems. This powerful mechanism arguably contributed quite a lot to the success of Unix. However, the design decision to standardize unstructured, type-less byte streams as a basis for communication can be seen as disadvantageous: Incompatible programs can mistakenly be hooked together, conversions have to be added manually, and the structure of the communicated data has to be re-created by parsers and serialized again at every node in the pipeline.
To overcome these disadvantages, we propose to explore the potential behind typed Unix pipes. Based on a type system with a special type constructor -- the ladder type, -- runtime type checks for pipes and potentially also files, program arguments and environment variables prevent the user to create invalid program graphs. This also allows to add automatic conversions that resolve type conflicts in the future. A prototypical implementation extends an existing shell with a type-check routine and demonstrates the concepts on a set of realistic use-case scenarios.
Operating systems are crucial for the performance of the overall system. Any inefficiency leads to a suboptimal use of the available resources and causes performance loss. The wide range of processors in use today makes it challenging to generate the most efficient code for the current hardware ahead of time. Just-in-time compilation, on the other hand, is able to generate efficient code tailored to the current execution context going beyond the processor, also including operating-system configuration or application demands. Moreover, its configuration can even be adapted at runtime to match the current external and internal requirements. Unfortunately, on-demand compilation of operating-system code has not found widespread use due to inherent difficulties stemming from the fact that any just-in-time approach requires extensive runtime support (e.g., for memory allocation for the generated code) usually provided by the operating system itself. A chicken-and-egg problem is found.
We believe that this problem of mutual dependency can be resolved and that on-demand compilation of operating-system code can yield the most suitable code for the actual execution context by exploiting runtime knowledge. In this paper, we outline the DOSY approach for just-in-time compilation of operating systems, focusing on the necessary means of overcoming all resource restrictions in the form of a standalone auxiliary interpreter. We implement an interpreter prototype in the memory-safe programming language Rust. Initial measurements indicate promising performance results, encouraging further work towards complete just-in-time compilation of operating systems.
Compartmentalization is a form of defensive software design in which an application is broken down into isolated but communicating components. Retrofitting compartmentalization into existing applications is often thought to be expensive from the engineering effort and performance overhead points of view. Still, recent years have seen proposals of compartmentalization methods with promises of low engineering efforts and reduced performance impact. ARM Morello combines a modern ARM processor with an implementation of Capability Hardware Enhanced RISC Instructions (CHERI) aiming to provide efficient and secure compartmentalization. Past works exploring CHERI-based compartmentalization were restricted to emulated/FPGA prototypes.
In this paper, we explore possible compartmentalization schemes with CHERI on the Morello chip. We propose two approaches representing different trade-offs in terms of engineering effort, security, scalability, and performance impact. We describe and implement these approaches on a prototype OS running bare metal on the Morello chip, compartmentalize two popular applications, and investigate the performance overheads. Furthermore, we show that compartmentalization can be achieved with an engineering cost that can be quite low if one is willing to trade off on scalability and security, and that performance overheads are similar to other intra-address space isolation mechanisms.
DRAM now accounts for over 30% of overall datacenter expense [30], due to its increasing cost and decreasing scaling. [19, 22]. As applications demand more memory, operators look for cost-effective solutions to handle these increasing requirements.
One way to address the problem is to use disaggregated or far memory [18, 23, 25, 30]. Far memory solutions have an access latency approximately an order of magnitude slower than DRAM, thus, accurate memory page prefetching is critical. Important applications show pointer-chasing behavior, and existing prefetchers struggle to effectively predict these patterns. We find that 35--78% of page faults for benchmarks we analyzed are due to pointer accesses, but the default kernel prefetcher is ineffective for these patterns.
We introduce a new generalized kernel pointer prefetcher using CHERI: Capability Hardware Enhanced RISC Instructions [32]. Our approach, called CHERI-picking, leverages CHERI pointer capabilities to identify locations that contain pointers and prefetch the pages those pointers reference, subject to a policy. CHERI-picking does not require changes to applications, profiling, or offline analysis.
We implement CHERI-picking in CheriBSD and evaluate it using benchmarks. Our results show that CHERI-picking is effective where traditional kernel prefetchers are not, indicating the promise of this approach. We also show the overheads of discovering pointers and discuss blocking faults (faults that are prefetched but still in transit when the application accesses them) that currently stand in the way of adopting CHERI-picking. We discuss potential avenues to address these challenges.
Stack unwinding is a well-established approach for handling panics in Rust programs. However, its feasibility on resource-constrained embedded systems has been unclear due to the associated overhead and complexity. This paper presents our experience of implementing stack unwinding and panic recovery within a Rust-based soft real-time embedded operating system. We describe several novel optimizations that help achieve adequate performance for a flying drone with a CPU overhead of 2.6% and a storage overhead of 26.0% to recover from panics in application tasks and interrupt handlers.
In contrast to hardware-based isolation solutions, language-based systems support crossing of isolation boundaries with an overhead of a function call. Moreover, the strong type system of a safe language provides support for secure communication in the face of complex, semantically-rich interfaces, i.e., support for fault isolation and end-to-end zero-copy communication through isolation of object spaces and controlled ownership on the shared exchange heap. If historically, safety was prohibitive due to overheads of a managed runtime, today, languages like Rust achieve the performance of unsafe C hence empowering language-based systems to support practical isolation with fine-grained boundaries and frequent communication.
Unfortunately, despite providing the core foundation for isolation of object spaces, i.e., support for a special shared heap, Rust still lacks several abstractions required to support zero-copy communication mechanisms. Existing Rust systems restrict zero-copy passing of data to a set of hand-coded types, hence limiting flexibility of changing interfaces between isolated subsystems. Our work extends the Rust compiler with a static analysis pass that reasons about assignments of references on the shared exchange heap and instruments them with the code that correctly reflects ownership updates on cross-subsystem invocations. This allows us to develop an isolation scheme in which a hierarchical data structure can be passed between isolated subsystems with a single ownership update of its root element.
The memory-safe systems programming language Rust is gaining more and more attention in the operating system development communities, as it provides memory safety without sacrificing performance or control. However, these safety guarantees only apply to the safe subset of Rust, while bare-metal programming requires some parts of the program to be written in unsafe Rust. Writing abstractions for these parts of the software that are sound, meaning that they guarantee the absence of undefined behavior and thus uphold the invariants of safe Rust, can be challenging. Producing sound code, however, is essential to avoid breakage when the code is used in new ways or the compiler behavior changes.
In this paper, we present common patterns of unsound abstractions derived from the experience of reworking soundness in our kernel. During this process, we were able to remove over 400 unsafe expressions while discovering and fixing several hard-to-spot concurrency bugs along the way.