racheldias
@rachelds__
Followers
36
Following
57
Media
0
Statuses
44
Joined May 2024
After reading it, this does seem like a big deal Industry experts outlined important, real-world, hard tasks for AI to do. Other experts were asked to do the tasks themselves & yet others graded human & AI output Models approached parity with humans & AI is getting better fast.
36
138
1K
This is one of the craziest graphs I've ever seen! AI Models went from dragging humans down (gpt-4o) → to breaking past the human baseline gpt-5 delivers ~1.6× efficiency in both speed and cost 📈
Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS
0
1
7
it’s wild how incredible olivia is tbh 👀
It’s wild how much peoples’ AI progress forecasts differ even a few years out. We need hard, realistic evals to bridge the gap with concrete evidence and measurable trends. Excited to share GDPval, an eval measuring performance on real, economically valuable white-collar tasks!
3
2
15
@simonpfish when he accidentally downloaded 1K GDPval tasks onto his local machine last night. he built an entire eval site for you! evals [dot] openai [dot] com
5
4
120
It’s wild how much peoples’ AI progress forecasts differ even a few years out. We need hard, realistic evals to bridge the gap with concrete evidence and measurable trends. Excited to share GDPval, an eval measuring performance on real, economically valuable white-collar tasks!
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
3
4
22
GDPval tests models on 1,320 well specified tasks from 44 real knowledge work occupations, written by experts with an average of 14 years of experience in their field.
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
1
2
12
Super excited to share what the team has been cooking... Introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. https://t.co/8MTK5bLJVX
openai.com
We’re introducing GDPval, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.
1
1
22
Dare I quote @gracejkim9 and say that this is in fact HUUUGGGEEE for the program! Dream team doing important work !
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
2
1
11
so excited for GDPval 🚀 our team's first eval measuring frontier models not just on raw intelligence, but on their ability to deliver real professional work across 44 jobs: covering Excel spreadsheets, docs, PDFs, audio and video files, CAD, and more!
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
4
8
47
our team’s home on the internet!!
Go to https://t.co/ggydl5W0C1 and see all the awesome work our team has been getting out there!
0
1
22
Go to https://t.co/ggydl5W0C1 and see all the awesome work our team has been getting out there!
7
11
136
@tejalpatwardhan See @tejalpatwardhan's post for a great in-depth look at GDPval:
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
0
1
23
💥 Announcing GDPval, a new eval that measures model performance on economically valuable, real-world tasks across 44 occupations.
9
18
366