Kevin Liu
@kliu128
Followers
10K
Following
6K
Media
53
Statuses
557
Interested in ai, systems, progress, living a good life! Preparedness at @openai, previously @stanford '24
cot token #42,443
Joined August 2016
our progress in automating software engineering is truly exciting to behold
0
0
5
A cat ran in front of a car and was run over. This happens 26 MILLION times per year in the US. Now @rachelswan and @JackieFielder_ want to ban vehicles. This is how moronic @Hearst @sfchronicle reporters & @sfbos are. It’s like they speak in tongues in a tents in Mississippi.
20
162
4K
Rhythm, Linden, and Yash are awesome people solving a good problem. Excited for what they’ll achieve.
Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's
0
0
4
Coming December 9 at 8:00 AM PT, our last Microsoft Research Forum episode of the year. Register now:
19
37
522
after a long break, i’m back at openai to start a new team with @troyluhman and @eric_luhman1 focused on an incredibly high-risk bet that has a small, but significant, chance of leading to ASI. we’re keeping the team tight but we’re open to high-slope researchers & engineers
98
38
1K
Air traffic control at Berlin's Tempelhof Central Airport, 1987
16
227
2K
Just like stepping on clouds with our Cashmere Slipper Home Socks! Perfect for chilly days around the house. Your new best friend for staying warm and stylish. Get Your Today!
0
101
1K
Might be the best work SemiAnalysis has done so far
Zareen is one of the go to places for many SF Bay Area AI researchers to get a quick bite. Most of the food is very good and was even on the Michelin guide in 2020. AI researchers not experienced with the Indian cuisine will commonly order their chicken tikka masala with garlic
2
0
7
When nyc built the 7 train, like 75% of it was literally just through cornfields!
65
220
5K
A research team at @OpenAI, where I am proud to be a board member, released an important new paper today. This paper looks at what might be thought of as task specific Turing Tests and shows that AI systems, even with limited guidance, perform many tasks -- such as planning
Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS
90
236
2K
It's worth reading. https://t.co/W9sGUcHbB0 It's been amazing to see the entire team work day and night on it, and I think it'll contribute significantly to our understanding of how LLMs affect work.
openai.com
We’re introducing GDPval, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.
0
0
4
GDPval tests models on 1,320 well specified tasks from 44 real knowledge work occupations, written by experts with an average of 14 years of experience in their field.
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
1
2
12
Marinade mania
There are some truly wild reasoning traces in @apolloaievals & OpenAI's recent paper The models appear to have developed specific uses for the words "marinade" "overshadow" "illusions" "vantage" and others. This seems likely to be the result of RL training
0
0
3
— ANNOUNCING CARMAGEDDON: ROGUE SHIFT — Combat racing is back & more brutal than ever. Do you have what it takes to get in the driver's seat?
273
751
5K
coding agents are like the sf central subway: unclear productivity (ridership) gains (yet) but makes people feel warm and fuzzy inside when using it which is valuable in and of itself
0
0
2
waiting to clean the kitchen until generalist home robotics arive
2
0
15
have we considered that humans also struggle with long horizon tasks?
2
0
43
First edition swe bench plot shirts
ok, if you want a shirt: $17 and all funds raised are donated to @cleanaircatf
https://t.co/NsLZIrYf2s
1
0
8