Kevin Liu @kliu128 X Profile

Kevin Liu

@kliu128

Followers

10K

Following

6K

Media

53

Statuses

557

Interested in ai, systems, progress, living a good life! Preparedness at @openai, previously @stanford '24

https://t.co/Rb6unK1weC

cot token #42,443

Joined August 2016

Don't wanna be here? Send us removal request.

Kevin Liu

@kliu128

11 days

our progress in automating software engineering is truly exciting to behold

0

5

Matt Popovich

@mpopv

1 month

https://t.co/kGt2lEstOr

MissionLoco

@MissionLoco

1 month

A cat ran in front of a car and was run over. This happens 26 MILLION times per year in the US. Now @rachelswan and @JackieFielder_ want to ban vehicles. This is how moronic @Hearst @sfchronicle reporters & @sfbos are. It’s like they speak in tongues in a tents in Mississippi.

20

162

4K

Kevin Liu

@kliu128

1 month

Rhythm, Linden, and Yash are awesome people solving a good problem. Excited for what they’ll achieve.

Applied Compute

@appliedcompute

1 month

Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's

0

4

Microsoft Research

@MSFTResearch

14 days

Coming December 9 at 8:00 AM PT, our last Microsoft Research Forum episode of the year. Register now:

19

37

522

will depue

@willdepue

1 month

after a long break, i’m back at openai to start a new team with @troyluhman and @eric_luhman1 focused on an incredibly high-risk bet that has a small, but significant, chance of leading to ASI. we’re keeping the team tight but we’re open to high-slope researchers & engineers

98

38

1K

Federico Italiano

@FedeItaliano76

9 months

Air traffic control at Berlin's Tempelhof Central Airport, 1987

16

227

2K

Kevin Liu

@kliu128

1 month

personal favorite is the samosa burger w fries

0

3

Zuloba

@ZulobaSt

2 months

Just like stepping on clouds with our Cashmere Slipper Home Socks! Perfect for chilly days around the house. Your new best friend for staying warm and stylish. Get Your Today!

0

101

1K

Kevin Liu

@kliu128

1 month

Might be the best work SemiAnalysis has done so far

SemiAnalysis

@SemiAnalysis_

1 month

Zareen is one of the go to places for many SF Bay Area AI researchers to get a quick bite. Most of the food is very good and was even on the Michelin guide in 2020. AI researchers not experienced with the Indian cuisine will commonly order their chicken tikka masala with garlic

2

0

7

Kevin Liu

@kliu128

2 months

tbh the only app subscription worth paying for is flighty

1

0

9

Basil🧡

@LinkofSunshine

2 months

When nyc built the 7 train, like 75% of it was literally just through cornfields!

65

220

5K

Lawrence H. Summers

@LHSummers

2 months

A research team at @OpenAI, where I am proud to be a board member, released an important new paper today. This paper looks at what might be thought of as task specific Turing Tests and shows that AI systems, even with limited guidance, perform many tasks -- such as planning

OpenAI

@OpenAI

2 months

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS

90

236

2K

Tejal Patwardhan

@tejalpatwardhan

2 months

this plot is wild right??

Tejal Patwardhan

@tejalpatwardhan

2 months

We also find that, when paired with human oversight, models have the potential to complete work tasks much faster and cheaper than humans alone.

16

14

253

Kevin Liu

@kliu128

2 months

It's worth reading. https://t.co/W9sGUcHbB0 It's been amazing to see the entire team work day and night on it, and I think it'll contribute significantly to our understanding of how LLMs affect work.

openai.com

We’re introducing GDPval, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.

0

4

Kevin Liu

@kliu128

2 months

GDPval tests models on 1,320 well specified tasks from 44 real knowledge work occupations, written by experts with an average of 14 years of experience in their field.

Tejal Patwardhan

@tejalpatwardhan

2 months

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

1

2

12

Kevin Liu

@kliu128

2 months

Marinade mania

Jeffrey Ladish

@JeffLadish

2 months

There are some truly wild reasoning traces in @apolloaievals & OpenAI's recent paper The models appear to have developed specific uses for the words "marinade" "overshadow" "illusions" "vantage" and others. This seems likely to be the result of RL training

0

3

Carmageddon: Rogue Shift

@playcarmageddon

4 days

— ANNOUNCING CARMAGEDDON: ROGUE SHIFT — Combat racing is back & more brutal than ever. Do you have what it takes to get in the driver's seat?

273

751

5K

Kevin Liu

@kliu128

3 months

coding agents are like the sf central subway: unclear productivity (ridership) gains (yet) but makes people feel warm and fuzzy inside when using it which is valuable in and of itself

0

2

Kevin Liu

@kliu128

3 months

waiting to clean the kitchen until generalist home robotics arive

2

0

15

Kevin Liu

@kliu128

3 months

emergent learning of multi agent workflows, colorized, 2004

0

4

Kevin Liu

@kliu128

3 months

have we considered that humans also struggle with long horizon tasks?

2

0

43

Kevin Liu

@kliu128

3 months

First edition swe bench plot shirts

Tejal Patwardhan

@tejalpatwardhan

3 months

ok, if you want a shirt: $17 and all funds raised are donated to @cleanaircatf https://t.co/NsLZIrYf2s

1

0

8