W&B Weave @weave_wb X Profile

W&B Weave

@weave_wb

Followers

1K

Following

884

Media

152

Statuses

498

A lightweight toolkit for tracking and evaluating LLM applications, built by @weights_biases for AI developers!

https://t.co/pfZy7lduq6

Land of GPUs

Joined October 2024

Don't wanna be here? Send us removal request.

W&B Weave

@weave_wb

1 month

Your RL run just spiked at step 89! But, do you know why? We’re fixing that. Today we’re launching W&B Weave Traces to give you a step by step look into your agent’s decisions. This is the first drop from our fresh new integration with @OpenPipeAI. More RL magic is incoming.

2

32

356

Tarun Jain

@TRJ_0751

21 hours

Over the weekend, I contributed FastEmbed embedding support for custom config in @mem0ai. (PR: #3552) Here’s an article showing how to implement long-term memory in Large Language Models using a custom setup with FastEmbed for embeddings, @Google Gemini as the LLM, and

0

2

W&B Weave

@weave_wb

3 days

Learn more here:

weave-docs.wandb.ai

For a limited time, the new W&B Inference service is included in your free tier. W&B Inference provides access to leading open-source foundation models via API and the Weave Playground.

0

W&B Weave

@weave_wb

3 days

Stop juggling tabs to test your prompts! 🥵 The W&B Weave Playground is your new home for iterating on and comparing LLMs. And did you know... you can now generate images right in the Playground? Just search "image" in the model dropdown!

1

2

3

Weights & Biases

@weights_biases

6 days

Best use of @weave_wb: Popstar @ax_xiong73047 @sidk_94827 @drdannenhauer & Zohreh Dannenhauer They created a "survival of the fittest" environment for learning strategies. An LLM proposes new reward functions & PPO tweaks, and an algorithm ensures only the best adaptations

2

7

Weights & Biases

@weights_biases

6 days

We asked builders at WeaveHacks 2 to push the limits of self-improving AI agents, and they delivered. With +175 builders & 66 teams, the innovation made this our hardest hackathon to judge EVER. Now, meet the winners who won over $20K in cash and prizes. 🧵

5

7

105

W&B Weave

@weave_wb

1 month

Your RL run just spiked at step 89! But, do you know why? We’re fixing that. Today we’re launching W&B Weave Traces to give you a step by step look into your agent’s decisions. This is the first drop from our fresh new integration with @OpenPipeAI. More RL magic is incoming.

2

32

356

Weights & Biases

@weights_biases

7 days

What's special about @karpathy's nanochat is that it has the entire LLM lifecycle in one repo. A full-stack recipe for your own ChatGPT clone for ~$100. So cool for us to see wandb included for metric logging for the pre, mid and RL training stages.

Andrej Karpathy

@karpathy

7 days

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

1

9

69

W&B Weave

@weave_wb

10 days

Watch out for the @OpenPipeAI announcement today on the @thursdai_pod around 1 hour in! 👀

Alex Volkov (Thursd/AI)

@altryne

10 days

POD UP: Covering my 3rd @OpenAIDevs Day in a row, and this one includes a few questions from me to @sama and @gdb from an exclusive fireside chat + a full breakdown of what they ⛴️, interview with @pvncher, Samsungs 7M TRM that beats the giants & building agents with AgentKit,

0

1

W&B Weave

@weave_wb

11 days

Seats are very limited. Register now to save your spot:

wandb.ai

In this session, we'll explore the fastest and most reliable ways to make LLMs solve real-world business problems. We'll dive deep in LLM orchestration, agent frameworks, and other similar systems...

0

W&B Weave

@weave_wb

11 days

NY/NJ Builders! 🚀 We have a in-person workshop on Architecting & Orchestrating AI Agents. Led by our AI Engineer @ash0ts on Oct 15, we'll dive deep into agent frameworks, evaluation techniques, and multi-agent collaboration. RSVP below!

1

0

2

Weights & Biases

@weights_biases

12 days

RL X-mas came early. 🎄 For too long, building powerful AI agents with Reinforcement Learning has been blocked by GPU scarcity and complex infrastructure. That ends today. Introducing Serverless RL from wandb, powered by @CoreWeave! We're making RL accessible to all.

9

17

153

W&B Weave

@weave_wb

13 days

Want to experiment with the top open-source models? We're giving away $50 in inference credits! To get them, just comment "RAG" below our first tweet. See all of our available models here:

docs.wandb.ai

Browse the foundation models available through W&B Inference

W&B Weave

@weave_wb

13 days

We all know RAG is powerful, but how do retrieval depth and model choice really interact? Does retrieving more documents always improve accuracy, or does it just introduce noise and inflate costs? We ran the experiments on @weave_wb to find the precise trade-offs. 🧪

0

1

W&B Weave

@weave_wb

13 days

See our full case study with details here:

wandb.ai

Publish your model insights with interactive plots for performance metrics, predictions, and hyperparameters. Made by Brett Young using Weights & Biases

1

0

2

W&B Weave

@weave_wb

13 days

The key takeaway: Optimizing a RAG pipeline is a balancing act. You have to co-design your retrieval strategy and generation model. Using W&B Weave is crucial for visualizing these trade-offs and finding the most efficient configuration for your use case.

1

0

2

W&B Weave

@weave_wb

13 days

This is where @weave_wb was critical. It was our complete evaluation toolkit. It gave us a unified dashboard to compare experiments, let us drill down into individual predictions to debug errors, and made the complex trade-offs between cost, latency, and accuracy clear.

1

0

2

W&B Weave

@weave_wb

13 days

The results were fascinating. The DeepSeek model achieved its highest correctness (~77%) with just 5 retrieved passages. The GLM-4.5 model required 10 passages to reach that same score. -> This proves optimal context size is model-specific; more isn't always better.

1

0

2

W&B Weave

@weave_wb

13 days

For generation, we systematically tested popular open-source models using our W&B Inference service. A separate judge model evaluated correctness, while W&B Weave tracked accuracy, cost, and latency for every single run.

1

0

3

W&B Weave

@weave_wb

13 days

Our setup used a subset of the MS MARCO dataset, indexed in @pinecone with a powerful multilingual embedding model. This gave us a high-quality retrieval system to see how different LLMs would handle the provided context.

1

0

2

W&B Weave

@weave_wb

13 days

For a quick refresher: RAG (Retrieval-Augmented Generation) fights LLM limits like outdated knowledge & hallucinations. It first retrieves relevant info from a knowledge base, then uses that context to generate a grounded, accurate, and cost-effective answer.

1

3

W&B Weave

@weave_wb

13 days

We all know RAG is powerful, but how do retrieval depth and model choice really interact? Does retrieving more documents always improve accuracy, or does it just introduce noise and inflate costs? We ran the experiments on @weave_wb to find the precise trade-offs. 🧪

8

39

540