Pengming Wang
@PengmingWang
Followers
314
Following
745
Media
4
Statuses
318
Founding team @poolsideai | prev @DeepMind, PhD @Cambridge_Uni, FunSearch co-author
London, England
Joined July 2021
Would love to connect with anyone who's impacted, and is looking to join a small, but well-resourced team to push to the frontier and beyond. We have one of the highest ratios of GPU resources per researcher. No politics or siloes.
0
2
10
We believe that to compete at the frontier, you have to own the full stack: from dirt to intelligence. Today we’re announcing two major unlocks for our mission to AGI: 1. We're partnering with @CoreWeave and have 40,000+ NVIDIA GB300s secured. First capacity comes online
poolside.ai
When people ask what it takes to build frontier AI, the focus is usually on the model—the architecture, the training runs, the research breakthroughs. But that’s only half the story.
35
50
423
We're hiring across many roles, including evaluations:
poolside.ai
Join us at poolside and work at the forefront of applied research and engineering at scale.
0
0
0
If you want to learn how we orchestrate evaluations at poolside, it's part of our model factory blog series:
poolside.ai
Running inference and evaluations inside the Model Factory
1
0
4
In the limit, evaluations are the ~only thing that matters. When models are self-improving, and every metric can be hill climbed, picking the metric becomes the most important thing. Evals will shift from being "writing unit tests" for research to being the *main thing*
2
4
16
We've not been very public about our progress on model building, but I fully believe poolside will be the next lab joining the frontier. We're now sharing a bit more how we're doing this, with a systems-first approach we're taking with our model factory.
2
3
22
If this is something that resonates with you, come join us!
0
1
2
We've spent quite some time at poolside thinking about this, and recently put down some words on how we're approaching this: https://t.co/DA11kWPw6Y
poolside.ai
When we founded poolside in San Francisco in April 2023, the narrative in the industry was that all we needed to reach AGI was to scale up language modelling.
1
1
2
Fundamentally I believe it comes down to learning more general representations of reasoning, beyond the relatively narrow domain of mental tactics required for math or coding
1
1
2
Test-time compute is powerful, but in its current form there is a lack of "harmony" with pre-training. Models feel split-brained: They're either deeply overthinking, with no trust in its own "common sense"; or they latch onto the nearest neighbour of meaning without deliberation
1
4
11
At this point, we’re mainly building puzzles for smart researchers and engineers at frontier labs to flex their hill climbing skills. Such puzzles are not really for the models, they’re for the people building the models.
On the heels of Humanity's Last Exam, @scale_AI & @ai_risks have released a new very-hard reasoning eval: EnigmaEval: 1,184 multimodal puzzles so hard they take groups of humans many hours to days to solve. All top models score 0 on the Hard set, and <10% on the Normal set 🧵
17
18
210
The real distillation is taking all the engineering and research you did on a large compute budget and then do it with with much less
1
0
6
From my own experience (a lot of failed research), you cannot cheat efficiency. If quantization fails, then also sparsification fails, and other efficiency mechanisms too. If this is true, we are close to optimal now. With this, there are only three ways forward that I see...
1
9
171
Things that feel long overdue for innovation in the LLM stack: - tokenization - sampling - loss functions
1
1
8
The feeling when you've built something that is quite nice, and now you can _really_ get started https://t.co/u36fHtX3q1
0
0
5