weave_wb Profile Banner
W&B Weave Profile
W&B Weave

@weave_wb

Followers
1K
Following
884
Media
152
Statuses
498

A lightweight toolkit for tracking and evaluating LLM applications, built by @weights_biases for AI developers!

Land of GPUs
Joined October 2024
Don't wanna be here? Send us removal request.
@weave_wb
W&B Weave
1 month
Your RL run just spiked at step 89! But, do you know why? We’re fixing that. Today we’re launching W&B Weave Traces to give you a step by step look into your agent’s decisions. This is the first drop from our fresh new integration with @OpenPipeAI. More RL magic is incoming.
2
32
356
@TRJ_0751
Tarun Jain
21 hours
Over the weekend, I contributed FastEmbed embedding support for custom config in @mem0ai. (PR: #3552) Here’s an article showing how to implement long-term memory in Large Language Models using a custom setup with FastEmbed for embeddings, @Google Gemini as the LLM, and
0
2
2
@weave_wb
W&B Weave
3 days
Stop juggling tabs to test your prompts! 🥵 The W&B Weave Playground is your new home for iterating on and comparing LLMs. And did you know... you can now generate images right in the Playground? Just search "image" in the model dropdown!
1
2
3
@weights_biases
Weights & Biases
6 days
Best use of @weave_wb: Popstar @ax_xiong73047 @sidk_94827 @drdannenhauer & Zohreh Dannenhauer They created a "survival of the fittest" environment for learning strategies. An LLM proposes new reward functions & PPO tweaks, and an algorithm ensures only the best adaptations
2
2
7
@weights_biases
Weights & Biases
6 days
We asked builders at WeaveHacks 2 to push the limits of self-improving AI agents, and they delivered. With +175 builders & 66 teams, the innovation made this our hardest hackathon to judge EVER. Now, meet the winners who won over $20K in cash and prizes. 🧵
5
7
105
@weave_wb
W&B Weave
1 month
Your RL run just spiked at step 89! But, do you know why? We’re fixing that. Today we’re launching W&B Weave Traces to give you a step by step look into your agent’s decisions. This is the first drop from our fresh new integration with @OpenPipeAI. More RL magic is incoming.
2
32
356
@weights_biases
Weights & Biases
7 days
What's special about @karpathy's nanochat is that it has the entire LLM lifecycle in one repo. A full-stack recipe for your own ChatGPT clone for ~$100. So cool for us to see wandb included for metric logging for the pre, mid and RL training stages.
@karpathy
Andrej Karpathy
7 days
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
1
9
69
@weave_wb
W&B Weave
10 days
Watch out for the @OpenPipeAI announcement today on the @thursdai_pod around 1 hour in! 👀
@altryne
Alex Volkov (Thursd/AI)
10 days
POD UP: Covering my 3rd @OpenAIDevs Day in a row, and this one includes a few questions from me to @sama and @gdb from an exclusive fireside chat + a full breakdown of what they ⛴️, interview with @pvncher, Samsungs 7M TRM that beats the giants & building agents with AgentKit,
0
0
1
@weave_wb
W&B Weave
11 days
NY/NJ Builders! 🚀 We have a in-person workshop on Architecting & Orchestrating AI Agents. Led by our AI Engineer @ash0ts on Oct 15, we'll dive deep into agent frameworks, evaluation techniques, and multi-agent collaboration. RSVP below!
1
0
2
@weights_biases
Weights & Biases
12 days
RL X-mas came early. 🎄 For too long, building powerful AI agents with Reinforcement Learning has been blocked by GPU scarcity and complex infrastructure. That ends today. Introducing Serverless RL from wandb, powered by @CoreWeave! We're making RL accessible to all.
9
17
153
@weave_wb
W&B Weave
13 days
Want to experiment with the top open-source models? We're giving away $50 in inference credits! To get them, just comment "RAG" below our first tweet. See all of our available models here:
docs.wandb.ai
Browse the foundation models available through W&B Inference
@weave_wb
W&B Weave
13 days
We all know RAG is powerful, but how do retrieval depth and model choice really interact? Does retrieving more documents always improve accuracy, or does it just introduce noise and inflate costs? We ran the experiments on @weave_wb to find the precise trade-offs. 🧪
0
0
1
@weave_wb
W&B Weave
13 days
The key takeaway: Optimizing a RAG pipeline is a balancing act. You have to co-design your retrieval strategy and generation model. Using W&B Weave is crucial for visualizing these trade-offs and finding the most efficient configuration for your use case.
1
0
2
@weave_wb
W&B Weave
13 days
This is where @weave_wb was critical. It was our complete evaluation toolkit. It gave us a unified dashboard to compare experiments, let us drill down into individual predictions to debug errors, and made the complex trade-offs between cost, latency, and accuracy clear.
1
0
2
@weave_wb
W&B Weave
13 days
The results were fascinating. The DeepSeek model achieved its highest correctness (~77%) with just 5 retrieved passages. The GLM-4.5 model required 10 passages to reach that same score. -> This proves optimal context size is model-specific; more isn't always better.
1
0
2
@weave_wb
W&B Weave
13 days
For generation, we systematically tested popular open-source models using our W&B Inference service. A separate judge model evaluated correctness, while W&B Weave tracked accuracy, cost, and latency for every single run.
1
0
3
@weave_wb
W&B Weave
13 days
Our setup used a subset of the MS MARCO dataset, indexed in @pinecone with a powerful multilingual embedding model. This gave us a high-quality retrieval system to see how different LLMs would handle the provided context.
1
0
2
@weave_wb
W&B Weave
13 days
For a quick refresher: RAG (Retrieval-Augmented Generation) fights LLM limits like outdated knowledge & hallucinations. It first retrieves relevant info from a knowledge base, then uses that context to generate a grounded, accurate, and cost-effective answer.
1
1
3
@weave_wb
W&B Weave
13 days
We all know RAG is powerful, but how do retrieval depth and model choice really interact? Does retrieving more documents always improve accuracy, or does it just introduce noise and inflate costs? We ran the experiments on @weave_wb to find the precise trade-offs. 🧪
8
39
540