
Distributed, Parallel, and Cluster Computing
@DPZ
Followers
218
Following
0
Media
0
Statuses
15K
New Distributed, Parallel, and Cluster Computing submissions to https://t.co/FMRl4YXmrm (not affiliated with https://t.co/FMRl4YXmrm)
Joined October 2010
SparkAttention: High-Performance Multi-Head Attention for Large Models on Volta GPU Architecture.
arxiv.org
Transformer are widely used in various fields such as natural language processing and computer vision. However, the training time for large Transformer models can be challenging due to the...
0
0
0
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost.
arxiv.org
Recent advances in deep learning are driven by the growing scale of computation, data, and models. However, efficiently training large-scale models on distributed systems requires an intricate...
0
0
0
Near-Optimal Sparse Allreduce for Distributed Deep Learning.
arxiv.org
Communication overhead is one of the major obstacles to train large deep learning models at scale. Gradient sparsification is a promising technique to reduce the communication volume. However, it...
0
0
0
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
arxiv.org
Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training...
0
0
1
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.
arxiv.org
Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information...
0
0
0
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations.
arxiv.org
Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous...
0
0
0
Optimizing Compilation for Distributed Quantum Computing via Clustering and Annealing.
arxiv.org
Efficiently mapping quantum programs onto Distributed quantum computing (DQC) are challenging, particularly when considering the heterogeneous quantum processing units (QPUs) with different...
0
0
0
Reliable Multi-view 3D Reconstruction for `Just-in-time' Edge Environments.
arxiv.org
Multi-view 3D reconstruction applications are revolutionizing critical use cases that require rapid situational-awareness, such as emergency response, tactical scenarios, and public safety. In...
0
0
0
TOAST: Fast and scalable auto-partitioning based on principled static analysis.
arxiv.org
Partitioning large machine learning models across distributed accelerator systems is a complex process, requiring a series of interdependent decisions that are further complicated by internal...
0
0
0
CausalMesh: A Formally Verified Causal Cache for Stateful Serverless Computing.
arxiv.org
Stateful serverless workflows consist of multiple serverless functions that access state on a remote database. Developers sometimes add a cache layer between the serverless runtime and the...
0
0
0
Efficient Mixed-Precision Large Language Model Inference with TurboMind.
arxiv.org
Mixed-precision inference techniques reduce the memory and computational demands of Large Language Models (LLMs) by applying hybrid precision formats to model weights, activations, and KV caches....
0
0
0
Lower Bounds for $k$-Set Agreement in Fault-Prone Networks.
arxiv.org
We develop a new lower bound for k-set agreement in synchronous message-passing systems connected by an arbitrary directed communication network, where up to t processes may crash. Our result thus...
0
0
0
Universal Dancing by Luminous Robots under Sequential Schedulers.
arxiv.org
The Dancing problem requires a swarm of $n$ autonomous mobile robots to form a sequence of patterns, aka perform a choreography. Existing work has proven that some crucial restrictions on...
0
0
0
Databelt: A Continuous Data Path for Serverless Workflows in the 3D Compute Continuum.
arxiv.org
Typically, serverless functions rely on remote storage services for managing state, which can result in increased latency and network communication overhead. In a dynamic environment such as the...
0
0
0
Declarative Data Pipeline for Large Scale ML Services.
arxiv.org
Modern distributed data processing systems face significant challenges in balancing system performance with code maintainability and developer productivity, particularly when integrating machine...
0
0
0
On the $h$-majority dynamics with many opinions.
arxiv.org
We present the first upper bound on the convergence time to consensus of the well-known $h$-majority dynamics with $k$ opinions, in the synchronous setting, for $h$ and $k$ that are both...
0
0
0
Action Engine: Automatic Workflow Generation in FaaS.
arxiv.org
Function as a Service (FaaS) is poised to become the foundation of the next generation of cloud systems due to its inherent advantages in scalability, cost-efficiency, and ease of use. However,...
0
0
0
Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data.
arxiv.org
Federated distillation has emerged as a promising collaborative machine learning approach, offering enhanced privacy protection and reduced communication compared to traditional federated learning...
0
0
0
Cooperative SGD with Dynamic Mixing Matrices.
arxiv.org
One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge...
0
0
0
FedEve: On Bridging the Client Drift and Period Drift for Cross-device Federated Learning.
arxiv.org
Federated learning (FL) is a machine learning paradigm that allows multiple clients to collaboratively train a shared model without exposing their private data. Data heterogeneity is a fundamental...
0
0
0