racheldias @rachelds__ X Profile

racheldias

@rachelds__

Followers

36

Following

57

Media

0

Statuses

44

Joined May 2024

Don't wanna be here? Send us removal request.

Ethan Mollick

@emollick

2 months

After reading it, this does seem like a big deal Industry experts outlined important, real-world, hard tasks for AI to do. Other experts were asked to do the tasks themselves & yet others graded human & AI output Models approached parity with humans & AI is getting better fast.

36

138

1K

Patrick Chao

@patrickrchao

2 months

This is one of the craziest graphs I've ever seen! AI Models went from dragging humans down (gpt-4o) → to breaking past the human baseline gpt-5 delivers ~1.6× efficiency in both speed and cost 📈

OpenAI

@OpenAI

2 months

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS

0

1

7

Sam Altman

@sama

2 months

very important work on a new eval

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

295

219

2K

Tejal Patwardhan

@tejalpatwardhan

2 months

@AchyutaBot wake up chat new eval just dropped

1

4

Grace Kim

@gracejkim9

2 months

it’s wild how incredible olivia is tbh 👀

Olivia Grace Watkins

@OliviaGWatkins2

2 months

It’s wild how much peoples’ AI progress forecasts differ even a few years out. We need hard, realistic evals to bridge the gap with concrete evidence and measurable trends. Excited to share GDPval, an eval measuring performance on real, economically valuable white-collar tasks!

3

2

15

Tejal Patwardhan

@tejalpatwardhan

2 months

@simonpfish when he accidentally downloaded 1K GDPval tasks onto his local machine last night. he built an entire eval site for you! evals [dot] openai [dot] com

5

4

120

Olivia Grace Watkins

@OliviaGWatkins2

2 months

It’s wild how much peoples’ AI progress forecasts differ even a few years out. We need hard, realistic evals to bridge the gap with concrete evidence and measurable trends. Excited to share GDPval, an eval measuring performance on real, economically valuable white-collar tasks!

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

3

4

22

Tejal Patwardhan

@tejalpatwardhan

2 months

this plot is wild right??

Tejal Patwardhan

@tejalpatwardhan

2 months

We also find that, when paired with human oversight, models have the potential to complete work tasks much faster and cheaper than humans alone.

16

14

253

Kevin Liu

@kliu128

2 months

GDPval tests models on 1,320 well specified tasks from 44 real knowledge work occupations, written by experts with an average of 14 years of experience in their field.

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

1

2

12

Phoebe Thacker

@phoebethacker

2 months

Super excited to share what the team has been cooking... Introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. https://t.co/8MTK5bLJVX

openai.com

We’re introducing GDPval, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.

1

22

Michele Wang

@michelelwang

2 months

@gracejkim9 DREAM TEAMMMMM!!! ❣️❣️❣️

0

1

2

Grace Kim

@gracejkim9

2 months

@michelelwang DREAM FREAKING TEAM ‼️‼️‼️

1

4

racheldias

@rachelds__

2 months

Dare I quote @gracejkim9 and say that this is in fact HUUUGGGEEE for the program! Dream team doing important work !

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

2

1

11

Michele Wang

@michelelwang

2 months

so excited for GDPval 🚀 our team's first eval measuring frontier models not just on raw intelligence, but on their ability to deliver real professional work across 44 jobs: covering Excel spreadsheets, docs, PDFs, audio and video files, CAD, and more!

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

4

8

47

Grace Kim

@gracejkim9

2 months

our team’s home on the internet!!

Simón

@simonpfish

2 months

Go to https://t.co/ggydl5W0C1 and see all the awesome work our team has been getting out there!

0

1

22

Michele Wang

@michelelwang

2 months

@gracejkim9 DREAM TEAM!!!!!

0

1

5

Grace Kim

@gracejkim9

2 months

so excited to release gdpval today with the most incredible team!!

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

3

2

19

Simón

@simonpfish

2 months

Go to https://t.co/ggydl5W0C1 and see all the awesome work our team has been getting out there!

7

11

136

Kevin Weil 🇺🇸

@kevinweil

2 months

@tejalpatwardhan See @tejalpatwardhan's post for a great in-depth look at GDPval:

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

0

1

23

Kevin Weil 🇺🇸

@kevinweil

2 months

💥 Announcing GDPval, a new eval that measures model performance on economically valuable, real-world tasks across 44 occupations.

9

18

366