driaforall Profile Banner
Dria Profile
Dria

@driaforall

Followers
15K
Following
2K
Media
170
Statuses
500

Dria is an inference lab dedicated to accessibility, coding, and agentic behavior.

Joined January 2024
Don't wanna be here? Send us removal request.
@driaforall
Dria
3 months
LLMs are stateless. We built Dria Mem Agent to change that: Making memory a first-class feature. A 4B agent with local interoperable memory across Claude, ChatGPT and LM Studio. It turns LLMs from stateless chat into stateful agents with persistent human-readable memory.
52
129
1K
@driaforall
Dria
23 days
Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.
6
31
151
@driaforall
Dria
23 days
We’re building dnet because we think local clusters will be the backbone for agentic workloads. If you want to run frontier-scale models from your desk, start with the alpha: 🔗
Tweet card summary image
dria.co
RUN BIG MODELS | RUN LONG CONTEXT | MAXIMIZE UTILIZATION
0
1
14
@driaforall
Dria
23 days
What’s next: -->128K context on home clusters --> higher throughput with faster comms and RDMA --> a unified backend where Apple Silicon, NVIDIA, and AMD share a single cluster and scheduling layer. Local AI without datacenter assumptions.
1
1
10
@PUBGBlackBudget
PUBG: Black Budget
24 days
Designed as an FPP-first experience, PUBG: Black Budget is a tactical extraction shooter built around tension, survival, and the unknown.
0
138
2K
@driaforall
Dria
23 days
dnet is designed as a plugin architecture: --Solver ---API Adapter ----Topology Adapter To add a new strategy (for example tensor parallel), you implement a solver in distilp and an adapter in dnet. Runtime, KV cache, and API stay the same.
1
0
8
@driaforall
Dria
23 days
Apple Silicon’s unified memory is a blessing and a trap. CPU and GPU share one pool; naïve mmap will overcommit and start swapping. dnet is UMA-aware: memory pressure buffers, Apple-specific loaders, and repacked weights keep the ring moving instead of thrashing.
1
0
8
@driaforall
Dria
23 days
In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM. https://t.co/QvThJLpgpH
Tweet card summary image
github.com
A Python library for MINLP-based layer/expert assignment for distributed inference across heterogeneous devices - firstbatchxyz/distilp
1
0
8
@driaforall
Dria
23 days
Under the hood, every run goes through: Strategy → Profiling → Scheduling —> Strategy: Choose how to distribute work (pipelined ring, tensor-parallel, long-context) —> Profiling: Measure FLOPs, memory, KV cache, latency, and disk to know each device's limits —> Scheduling:
1
0
9
@driaforall
Dria
23 days
Built on @Apple MLX, dnet focuses on: --> Distributed execution strategies --> Automatic device/model profiling --> A heterogeneity-aware solver --> A drop-in OpenAI-style API.
1
0
10
@driaforall
Dria
23 days
Local AI today is great for short generations, but breaks down when you want big models, long context, high throughput, and heterogeneous devices. dnet addresses that gap head-on. https://t.co/OJ20zwWaOD https://t.co/ojvS1CBFRO
1
1
13
@driaforall
Dria
23 days
Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.
6
31
151
@driaforall
Dria
30 days
If you’re tuning prompts for agents, evaluators, or any system that lives or dies by reliability, GEPA makes a massive difference. And now you can use it without building the whole loop yourself. Try it and tell us where you’d like us to push next: https://t.co/OcTkg1HPvz
Tweet card summary image
docs.dria.co
0
1
7
@driaforall
Dria
30 days
Right now, Dria’s GEPA service runs on GPT models. Soon, it will run on every model supported by our Batch Inference API, without rate limits, and powered by a global distributed LLM network. Cheaper test-time compute, more parallelism, and much faster GEPA cycles.
1
1
6
@driaforall
Dria
30 days
We’ve loved GEPA from the start and have used it in our products for months. We believe it deserves to be a stand-alone tool. In September we invited @LakshyAAAgrawal, the creator of GEPA, to our SF meetup, and since then it has seen incredible adoption across many use cases.
1
1
9
@driaforall
Dria
30 days
Today we’re releasing something we’ve used internally for a long time: GEPA Prompt Optimization as a service. A fully automated GEPA optimizer. No orchestration, no retries, no evaluation scripts, no parallel test-time compute hacks. You send a task, it handles the rest.
4
6
38
@driaforall
Dria
2 months
Join the waitlist for early access: https://t.co/Y2EnesNs2X
2
1
13
@driaforall
Dria
2 months
Last night we covered scaling inference, RL, and coding agents in the real world with 200+ attendees. Huge thanks to the speakers from @Meta , @anyscalecompute , @sgl_project, @LaudeInstitute for sharing incredible insights. We also introduced Kai, our evolutionary coding
3
6
24
@driaforall
Dria
2 months
While Inference Max pushes the boundaries of scale, Inference Arena explores the boundaries of accessibility. Our latest post reflects this mindset, what personal computing really means in the current state of the art, and how far a single GPU can actually go: 👉
1
0
3
@driaforall
Dria
2 months
The challenge starts with the discovery. If you’ve ever spent hours trying to figure out what model runs best on your GPU, you know how fragmented things are. That’s why we built Agent Mode. You just ask your question, and our agent will search, analyze, and explain inference
1
0
4
@driaforall
Dria
2 months
We have benchmarked over 100 combinations across five engines, 14 hardware setups, and three platforms, capturing the full range of what is possible on consumer-grade hardware. This is part of our ongoing effort to make personal inference measurable, transparent, and accessible.
1
0
3
@driaforall
Dria
2 months
We launched Inference Arena to benchmark inference at personal computing scale. @SemiAnalysis_ released Inference Max to benchmark inference at data center scale. Inference Arena is built for developers, researchers, and curious builders running models on their own devices.
3
0
15