Dria @driaforall X Profile

Dria

@driaforall

Followers

15K

Following

2K

Media

170

Statuses

500

Dria is an inference lab dedicated to accessibility, coding, and agentic behavior.

https://t.co/xkzDodXZBw

Joined January 2024

Don't wanna be here? Send us removal request.

Dria

@driaforall

3 months

LLMs are stateless. We built Dria Mem Agent to change that: Making memory a first-class feature. A 4B agent with local interoperable memory across Claude, ChatGPT and LM Studio. It turns LLMs from stateless chat into stateful agents with persistent human-readable memory.

52

129

1K

Dria

@driaforall

23 days

Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.

6

31

151

Dria

@driaforall

23 days

We’re building dnet because we think local clusters will be the backbone for agentic workloads. If you want to run frontier-scale models from your desk, start with the alpha: 🔗

dria.co

RUN BIG MODELS | RUN LONG CONTEXT | MAXIMIZE UTILIZATION

0

1

14

Dria

@driaforall

23 days

What’s next: -->128K context on home clusters --> higher throughput with faster comms and RDMA --> a unified backend where Apple Silicon, NVIDIA, and AMD share a single cluster and scheduling layer. Local AI without datacenter assumptions.

1

10

PUBG: Black Budget

@PUBGBlackBudget

24 days

Designed as an FPP-first experience, PUBG: Black Budget is a tactical extraction shooter built around tension, survival, and the unknown.

0

138

2K

Dria

@driaforall

23 days

dnet is designed as a plugin architecture: --Solver ---API Adapter ----Topology Adapter To add a new strategy (for example tensor parallel), you implement a solver in distilp and an adapter in dnet. Runtime, KV cache, and API stay the same.

1

0

8

Dria

@driaforall

23 days

Apple Silicon’s unified memory is a blessing and a trap. CPU and GPU share one pool; naïve mmap will overcommit and start swapping. dnet is UMA-aware: memory pressure buffers, Apple-specific loaders, and repacked weights keep the ring moving instead of thrashing.

1

0

8

Dria

@driaforall

23 days

In alpha, we ship a pipelined-ring strategy inspired by PRIMA.CPP. dnet’s solver (distilp) extends it so devices can punch above memory: layers stream from disk mid-round and overlap with compute, so total model size can exceed total cluster RAM. https://t.co/QvThJLpgpH

github.com

A Python library for MINLP-based layer/expert assignment for distributed inference across heterogeneous devices - firstbatchxyz/distilp

1

0

8

Dria

@driaforall

23 days

Under the hood, every run goes through: Strategy → Profiling → Scheduling —> Strategy: Choose how to distribute work (pipelined ring, tensor-parallel, long-context) —> Profiling: Measure FLOPs, memory, KV cache, latency, and disk to know each device's limits —> Scheduling:

1

0

9

Dria

@driaforall

23 days

Built on @Apple MLX, dnet focuses on: --> Distributed execution strategies --> Automatic device/model profiling --> A heterogeneity-aware solver --> A drop-in OpenAI-style API.

1

0

10

Dria

@driaforall

23 days

Local AI today is great for short generations, but breaks down when you want big models, long context, high throughput, and heterogeneous devices. dnet addresses that gap head-on. https://t.co/OJ20zwWaOD https://t.co/ojvS1CBFRO

1

13

Dria

@driaforall

23 days

Today we are shipping dnet, a distributed inference framework that lets Apple Silicon clusters run models that exceed their physical memory. We fuse pipelined-ring parallelism, disk streaming and UMA-aware scheduling so “out of memory” stops being the limit.

6

31

151

Dria

@driaforall

30 days

If you’re tuning prompts for agents, evaluators, or any system that lives or dies by reliability, GEPA makes a massive difference. And now you can use it without building the whole loop yourself. Try it and tell us where you’d like us to push next: https://t.co/OcTkg1HPvz

docs.dria.co

0

1

7

Dria

@driaforall

30 days

Right now, Dria’s GEPA service runs on GPT models. Soon, it will run on every model supported by our Batch Inference API, without rate limits, and powered by a global distributed LLM network. Cheaper test-time compute, more parallelism, and much faster GEPA cycles.

1

6

Dria

@driaforall

30 days

We’ve loved GEPA from the start and have used it in our products for months. We believe it deserves to be a stand-alone tool. In September we invited @LakshyAAAgrawal, the creator of GEPA, to our SF meetup, and since then it has seen incredible adoption across many use cases.

1

9

Dria

@driaforall

30 days

Today we’re releasing something we’ve used internally for a long time: GEPA Prompt Optimization as a service. A fully automated GEPA optimizer. No orchestration, no retries, no evaluation scripts, no parallel test-time compute hacks. You send a task, it handles the rest.

4

6

38

Dria

@driaforall

2 months

Join the waitlist for early access: https://t.co/Y2EnesNs2X

2

1

13

Dria

@driaforall

2 months

Last night we covered scaling inference, RL, and coding agents in the real world with 200+ attendees. Huge thanks to the speakers from @Meta , @anyscalecompute , @sgl_project, @LaudeInstitute for sharing incredible insights. We also introduced Kai, our evolutionary coding

3

6

24

Dria

@driaforall

2 months

While Inference Max pushes the boundaries of scale, Inference Arena explores the boundaries of accessibility. Our latest post reflects this mindset, what personal computing really means in the current state of the art, and how far a single GPU can actually go: 👉

1

0

3

Dria

@driaforall

2 months

The challenge starts with the discovery. If you’ve ever spent hours trying to figure out what model runs best on your GPU, you know how fragmented things are. That’s why we built Agent Mode. You just ask your question, and our agent will search, analyze, and explain inference

1

0

4

Dria

@driaforall

2 months

We have benchmarked over 100 combinations across five engines, 14 hardware setups, and three platforms, capturing the full range of what is possible on consumer-grade hardware. This is part of our ongoing effort to make personal inference measurable, transparent, and accessible.

1

0

3

Dria

@driaforall

2 months

We launched Inference Arena to benchmark inference at personal computing scale. @SemiAnalysis_ released Inference Max to benchmark inference at data center scale. Inference Arena is built for developers, researchers, and curious builders running models on their own devices.

3

0

15