Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices (e.g., mobile devices, IoT edge nodes). It enables Artificial Intelligence (AI) at the edge by creating models without sharing actual data across the network. Existing research typically focuses on generic aspects of non-IID data and heterogeneity in client's system characteristics, but they often neglect the issue of insufficient data for model development, which can arise from uneven class label distribution and highly variable data volumes across edge nodes. In this work, we propose FLIGAN, a novel approach to address the issue of data incompleteness in FL. First, we leverage Generative Adversarial Networks (GANs) to adeptly capture complex data distributions and generate synthetic data that closely resemble real-world data. Then, we use synthetic data to enhance the robustness and completeness of datasets across nodes. Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process. We incorporate techniques such as classwise sampling and node grouping, designed to improve the federated GAN's performance, enabling the creation of high-quality synthetic datasets and facilitating efficient FL training. Empirical results from our experiments demonstrate that FLIGAN significantly improves model accuracy, especially in scenarios with high class imbalances, achieving up to a 20% increase in model accuracy over traditional FL baselines.
Blockchains today amass terabytes of transaction data that demand efficient and insightful real-time analytics for applications such as smart contract hack detection, price arbitrage on decentralized exchanges, or trending token analysis. Conventional blockchain nodes, constrained by their RPC APIs, and specialized ETL-based blockchain analytics systems grapple with a trade-off between materializing pre-calculated query results and analytical expressiveness. In response, we introduce AlterEgo, a blockchain node architected specifically for analytics that maintains parity with traditional nodes in ingesting consensus-produced blocks while integrating a robust analytics API. Our prototype supports efficient transactional and analytical processing while circumventing the rigidity of ETL workflows, offering a better trust model, enabling distributed and collaborative querying, and achieving significant performance improvements over the state-of-the-art.
The pub/sub paradigm facilitates communication among heterogeneous edge and IoT devices for distributed edge applications. At the same time, the increasing number of devices and sensors at the edge leads to higher network congestion, which requires more processing power for both pub/sub brokers and devices. While message filtering based on subscriber-specified rules can alleviate this, implementing filtering logic on the brokers still requires publishing components to send all messages over constrained links.
In this paper, we instead propose filtering pub/sub messages directly on the publisher, in order to reduce network congestion. We present ShutPub, a publisher-side middleware that performs message filtering before forwarding them to the broker. ShutPub limits the publisher message dissemination based on subscriptions and their filters while still being transparent to the publisher. In this way, the content-based filter capabilities are moved from the broker to the publisher so that only messages that have a receiver are transmitted. Our prototype evaluation shows that ShutPub can reduce system strain on the network and broker without simply shifting the burden to the publisher, which benefits from sending fewer messages.
WebAssembly (Wasm) has been attracting attention as a common platform for edge computing thanks to its architecture neutrality, sandboxing for security, and lightweight characteristics that fit the requirements for distributed and cooperated processing with heterogeneous compute nodes, which consist of edge devices and clouds. Virtual machine (VM) migration technology enables seamless and flexible workload offloading on edge-cloud collaboration, but unfortunately, VM migration among different Wasm runtimes is complicated since the Wasm standardization does not specify the implementation design of internal runtime very well. Nevertheless, many implementations of Wasm runtime already exist, and each has different characteristics. We therefore aim to implement a stateful VM migration among most of the major Wasm runtimes. In this work, we prototyped a VM migration mechanism between WasmEdge and WAMR to investigate the technical challenges and determine the feasibility of implementing the runtime-neutral VM migration. Our experimental results on heterogenous VM migration demonstrate the value of coexisting multiple runtimes well-suited for each edge and cloud and seamlessly offloading the workloads among them.
Cerberus uses face detection in edge devices to perform privacy-preserving crowd counting and localisation. We describe its deployment in a university setting where ceiling-mounted cameras perform real-time face detection to report occupied seats without storing or transmitting images. Cerberus' aim is ultimately to integrate with digital twins over a LoRa network enabling data visualisation and support applications in building informatics, while balancing data accuracy and individual privacy. The paper describes the system's design, deployment, and potential for broader urban informatics applications, highlighting its effectiveness in privacy-preserving crowd monitoring.
Inference latency prediction on mobile devices is essential for multiple applications, including collaborative inference and neural architecture search. Training accurate latency predictors using ML techniques requires sufficient and representative data; however, collection of such data is challenging. To overcome these challenges, in this work, we focus on constructing a comprehensive dataset that can be used to predict inference latency on mobile devices. Our dataset contains 102 real-world CNNs, 69 real-world ViTs and 1000 synthetic CNNs across 174 diverse experimental environments on mobile platforms, accounting for critical factors affecting inference latency, including hardware heterogeneity, data representations and ML frameworks. Our code is available at: https://github.com/qed-usc/mobile-ml-benchmark.git.
Stream processing is becoming increasingly significant in various scenarios, including security-sensitive sectors. It benefits from keeping data in memory, which exposes large volumes of data in use, thereby emphasising the need for protection. The recent development of confidential computing makes such protection technologically feasible. However, these new hardware-based protection methods incur performance overhead. Our evaluation shows that replacing legacy VMs with confidential VMs to run streaming applications incurs up to 8.5% overhead on the throughput of the queries we tested in the NEXMark benchmark suite. Pursuing specialised protection for broader attacks, such as attacks at the edge with more physical exposure, can push this overhead further. In this paper, we propose a resource scheduling strategy for stream processing applications tailored to the privacy needs of specific application functions. We implement this system model using Apache Flink, a widely-used stream processing framework, making it aware of the underlying cluster's protection capability and scheduling the application functions across resources with different protections tailored to the privacy requirements of an application and the available deployment environment.
GPUs are emerging as the most popular accelerator for many applications, powering the core of machine learning applications. In networked GPU-accelerated applications input & output data typically traverse the CPU and the OS network stack multiple times, getting copied across the system's main memory. These transfers increase application latency and require expensive CPU cycles, reducing the system's efficiency, and increasing the overall response times. These inefficiencies become of greater importance in latency-bounded deployments, or with high throughput, where copy times could quickly inflate the response time of modern GPUs. We leverage the efficiency and kernel-bypass benefits of RDMA to transfer data in and out of GPUs without using any CPU cycles or synchronization. We demonstrate the ability of modern GPUs to saturate a 100-Gbps link, and evaluate the network processing time in the context of an inference serving application.
Minimizing database access latency is crucial in serverless edge computing for many applications, but databases are predominantly deployed in cloud environments, resulting in costly network round-trips. Embedding an in-process database library such as SQLite into the serverless runtime is the holy grail for low-latency database access. However, SQLite's architecture limits concurrency and multitenancy, which is essential for serverless providers, forcing us to rethink the architecture for integrating a database library.
We propose rearchitecting SQLite to provide asynchronous byte-code instructions for I/O to avoid blocking in the library and decoupling the query and storage engines to facilitate database and serverless runtime co-design. Our preliminary evaluation shows up to a 100x reduction in tail latency, suggesting that our approach is conducive to runtime/database co-design for low latency.
As IoT devices multiply and produce vast volumes of data, there is a heightened demand for instantaneous data processing. However, traditional cloud computing cannot adequately address these demands due to its latency and bandwidth limitations. Edge computing has emerged as a viable alternative with a hierarchical deployment of datacenters. However, this introduces additional layers of infrastructure and management that increase application development complexity. Using a shared file system is an attractive method for enhancing communication between components in an edge computing application.
In this paper we introduce PathFS, a shared file system designed for the hierarchical edge-cloud infrastructure. PathFS adopts a treelike structure, with cloud datacenters at the root, edge datacenters as leaves, and a variable number of network datacenters in between. We evaluate PathFS through benchmarks on an emulated hierarchical edge deployment and compare it with NFS and ownCloud. The results show that PathFS offers lower latency than these systems by an order of magnitude, and scales to a larger number of concurrent clients without performance impacts, providing an end-to-end latency reduction of at least 80%.