Shreyas Pimpalgaonkar
@sayshrey
Followers
123
Following
312
Media
1
Statuses
103
founding eng @bespokelabsai prev @EmergentAGI @citadel @nyu @goldmansachs @iitbombay
San Francisco
Joined February 2024
Fully open source code 💻, model 🤖, and dataset 📊 Amazing work by the team! :) https://t.co/642X5186Yw
Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while
0
1
4
Only meta knows what we really want. I compared the amazon homepage recommendations of me and my flatmate and found that they were almost identical, even though we have such different interests.
>amazon homepage. black friday >scroll entire page >years of data, every item ever clicked on and bought >not a single interesting item suggested to me why
0
0
0
Anyone’s “you’re ngmi” doesn’t hold any value if they haven’t made it yet. I’m seeing many people use it during minor disagreements on random things that don’t even correlate with success - like using vscode or cursor.
0
0
0
No phones in office
Competitive gamers: that phone on your desk while gaming is tanking your performance. In a study of 500+ participants, performance declined 10-15% on memory & intelligence tests with phones visible. Your brain constantly suppresses the urge to check, stealing mental bandwidth.
0
0
0
We don’t need a biped for most homes, we just a roomba with hands - one that has adjustable height and variable arm length is optimal
The table-to-dishwasher task is the classic nightmare scenario for roboticists: Long-horizon, highly dexterous, precise, whole-body manipulation combined with delicate, transparent, reflective, and deformable objects. Yet Memo handles it so naturally and elegantly.
0
0
0
The table-to-dishwasher task is the classic nightmare scenario for roboticists: Long-horizon, highly dexterous, precise, whole-body manipulation combined with delicate, transparent, reflective, and deformable objects. Yet Memo handles it so naturally and elegantly.
78
152
2K
It was never “build what’s fundable” and it isn’t now. It’s always been: make something people want. This comes from users and customers and the needs of others.
197
77
1K
i hate small talk. i wanna talk atoms. death. aliens. sex. the meaning of life. teleoperation. being monogamish. what makes a banger. ur childhood. what keeps u up at night. i like people with depth. i dont want to know "whats up"
847
1K
10K
even today I can’t summon my car from the same parking lot if it’s charging maybe this will need at least 5 more years, as the cost benefit math doesn’t add up yet for the significant charging infrastructure overhaul needed I will definitely not use it to summon the car cross
In ~2 years, summon should work anywhere connected by land & not blocked by borders, eg you're in LA and the car is in NY
0
0
0
In ~2 years, summon should work anywhere connected by land & not blocked by borders, eg you're in LA and the car is in NY
885
4K
7K
To push self-driving into situations wilder than reality, we built a neural network world simulator that can create entirely synthetic worlds for the Tesla to drive in. Video below is fully generated & not a real video
476
2K
11K
What are RL environments? Are they just evals? There is significant confusion in the community, so here is my opinion: My answer is inspired by Terminal-bench, an elegant framework for creating RL environments, evaluating agents and even training agents. First, an RL
6
30
340
Day 5: Humanity's Last Exam! Insights/Fun facts: 1. Created by Scale and Center for AI Safety (which is co-founded by Dan Hendrycks of MMLU-fame). 2. Supposed to be incredibly hard for AI models (hence the name), but we all know how this is going to go. 3. There are 193
1
5
19
Day 2 of drilling down into popular benchmarks for models/agents. Benchmark #2: GSM8K, (or Grade School Math 8K) A dataset of 8,500 high-quality, linguistically diverse grade school math word problems. Our viewer for GSM8K dataset reveals some pretty interesting insights:
Understanding what’s in the data is a high leverage activity when it comes to training/evaluating models and agents. This week we will drill down into a few popular benchmarks and share some custom viewers that will help pop up various insights. Our viewer for GPQA (Google
1
4
13
Understanding what’s in the data is a high leverage activity when it comes to training/evaluating models and agents. This week we will drill down into a few popular benchmarks and share some custom viewers that will help pop up various insights. Our viewer for GPQA (Google
1
10
48
Check out my work at @bespokelabsai We release Bespoke-MiniChart-7B, a new SOTA in chart understanding of its size Chart understanding is really fun and challenging and requires reasoning skills beyond math reasoning It's a great starting point for open chart model development!
Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀 Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires
0
12
31
Thanks to all who contributed to the work! @LiyanTang4, @kartiks26387917, @sayshrey, @madiator, @gregd_nlp and @bespokelabsai​​ team. Shoutout to @LambdaAPI for compute credits. 8/8
1
2
8
Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀 Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires
2
15
68
OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with your own agents, using RL and open-source models. We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to
21
52
379
Announcing Reasoning Datasets Competition📢in collaboration with @huggingface and @togethercompute Since the launch of DeepSeek-R1 this January, we’ve seen an explosion of reasoning-focused datasets: OpenThoughts-114k, OpenCodeReasoning, codeforces-cot, and more.
3
46
114
@soldni Haven't personally used it but I saw @AlexGDimakis announce Curator. Has a lot of features one would want to scale as well. https://t.co/I4bZn2iGrL
github.com
Synthetic data curation for post-training and structured data extraction - bespokelabsai/curator
1
2
12