Labelbox @labelbox X Profile

Labelbox

@labelbox

Followers

3K

Following

589

Media

28

Statuses

240

High-quality frontier data for leading AI teams.

https://t.co/h7PVvc5ZPb

San Francisco, CA

Joined January 2018

Don't wanna be here? Send us removal request.

Labelbox

@labelbox

16 days

Essential weekend reading. The Scaling Era: an oral history of AI by @dwarkesh_sp and thank you @stripepress!

4

18

217

jack

@jack

25 days

this is great

Dwarkesh Patel

@dwarkesh_sp

26 days

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self

101

300

5K

Labelbox

@labelbox

26 days

Link to the full episode:

1

6

Labelbox

@labelbox

26 days

Highly recommend tuning into @dwarkesh_sp's episode today with @karpathy. They dive deep into why RL is so information-sparse and what that means for realizing the decade of agents. A few highlights that stood out: - “RL is terrible; it’s just that everything else is much

25

167

2K

Labelbox

@labelbox

2 months

Get in touch to learn more about our latest work in RL here:

labelbox.com

Discover how we partner with researchers to fuel the next wave of AI advancements, powered by experts in post-training and model evaluation.

1

4

Labelbox

@labelbox

2 months

Thrilled to be featured in Dwarkesh’s latest episode with Richard Sutton, widely regarded as the father of reinforcement learning and 2024 Turing Award winner. As Richard explains, we’re entering the Era of Experience, where training AI means creating environments that capture

13

90

739

Labelbox

@labelbox

2 months

See more of Dwarkesh’s visit and get in touch to learn how Labelbox delivers large-scale, high-fidelity data collection to advance next-gen robotics.

labelbox.com

Discover how we partner with researchers to fuel the next wave of AI advancements, powered by experts in post-training and model evaluation.

0

5

Labelbox

@labelbox

2 months

As his latest guest, @svlevine (co-founder of @physical_int) predicts, robots could be running households entirely autonomously by 2030.

1

8

Labelbox

@labelbox

2 months

We recently invited @dwarkesh_sp to stop by our SF robotics lab. World-class podcaster, rookie robotics intern.

19

180

2K

Dwarkesh Patel

@dwarkesh_sp

2 months

.@svlevine is one of the world's leading robotics researchers (and co-founder of @physical_int). He thinks fully autonomous robots are much closer than people realize - when I pushed him on a prediction, he said 5 years to robots that can autonomously run your household). The

21

121

970

Labelbox

@labelbox

2 months

If you’re a Dwarkesh fan, check out the landing page and follow along, this is just the beginning of something special.

0

4

Labelbox

@labelbox

2 months

We’ve always admired how @dwarkesh_sp sparks conversations with top thinkers in AI, academia, and tech. Now we’re teaming up to connect with a community that shares our mission of pushing the limits of what’s possible in AI. The first episode together with one of his most

32

127

1K

Labelbox

@labelbox

3 months

We’ll continue evaluating frontier models on more constraint domains and reporting as the gap between leading AI capabilities closes. Check out our blog post for more info!

labelbox.com

0

3

Labelbox

@labelbox

3 months

Lessons learned: Constraint interactions, not just rules, limit performance, and success on synthetic tasks doesn’t always transfer to real-world cases. We observe that high constraint densities tend to also expose weaknesses, and analyzing failures helps guide targeted

1

3

Labelbox

@labelbox

3 months

Our initial findings show that no current model maintains consistent feasibility under real-world, high-complexity scenarios. On synthetic stress tests, o3 demonstrates the highest feasibility, closely followed by GPT-5. In a domain-grounded data center migration benchmark, GPT-5

1

0

2

Labelbox

@labelbox

3 months

We tested whether leading models could generate schedules on RCPSP (resource-constrained project scheduling problems) that meet all constraints and remain consistent as complexity increases. To do this, we varied task difficulty across hundreds of levels and applied realistic

1

0

2

Labelbox

@labelbox

3 months

Introducing ConstraintBench: a new benchmark for evaluating LLM reasoning on realistic resource-constrained project scheduling problems (RCPSP), a well-known NP-complete challenge. It tackles some of the toughest planning challenges (such as project management, construction,

10

62

523

Labelbox

@labelbox

4 months

As AI advances, so do the human skills required to shape and align it. Full report: https://t.co/pmiGSriNU6

0

4

Labelbox

@labelbox

4 months

Grok-4 just landed on our Complex Reasoning leaderboard, and it’s impressive💥 - Math: 81.8% - Pure Math: 84.8% - Applied Math: 79.9% - CS: 75.4% - Reasoning: 77.8% - Aggregate: 80.7% See how it stacks up:

labelbox.com

The Labelbox complex reasoning leaderboard rigorously assesses top AI models against some of the most demanding tasks available today.

11

2

12

Labelbox

@labelbox

4 months

obligatory AI company SF billboard

3

1

13