Manas Joglekar @ManasJoglekar X Profile

Manas Joglekar

@ManasJoglekar

Followers

270

Following

64K

Media

0

Statuses

45

Joined October 2011

Don't wanna be here? Send us removal request.

Yann Dubois

@yanndubs

2 days

Proud of our *GPT5.2 Thinking* We focused on economically valuable tasks (coding, sheets, slides) as shown by GDPval: - 71% wins+ties - 11x faster - 100x cheaper than experts. There's still a lot to improve, including UX/better connectors/reliability. It's just the beginning!

18

32

468

Sam Altman

@sama

2 days

It is a very smart model, and we have come a long way since GPT-5.1:

2K

18K

Greg Brockman

@gdb

10 days

proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts:

OpenAI

@OpenAI

10 days

In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE

114

79

1K

MIT Technology Review

@techreview

10 days

OpenAI has trained its LLM to confess to bad behavior

technologyreview.com

Large language models often lie and cheat. We can’t stop that—but we can make them own up.

2

5

10

Boaz Barak

@boazbaraktcs

10 days

2/5 A “confession” is a special self-evaluation by the model that during training is rewarded solely for honesty. Crucially, the model is not penalized in any way for confessing honestly for bad behavior. We show that the model is far more likely to confess to bad behavior in

1

2

26

Boaz Barak

@boazbaraktcs

10 days

1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang

9

21

155

Boaz Barak

@boazbaraktcs

10 days

Great article by @strwbilly on our confessions work https://t.co/qK7K42viEq

technologyreview.com

Large language models often lie and cheat. We can’t stop that—but we can make them own up.

1

3

18

OpenAI

@OpenAI

10 days

We trained a variant of GPT-5 Thinking to produce two outputs: (1) the main answer you see. (2) a confession focused only on honesty about compliance. The main answer is judged across many dimensions—like correctness, helpfulness, safety, style. The confession is judged and

87

62

700

Boaz Barak

@boazbaraktcs

10 days

Blog on our work

OpenAI

@OpenAI

10 days

In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE

2

3

124

Wojciech Zaremba

@woj_zaremba

10 days

Modern alignment 🤝 Sunday confession. Training "truth serum" for AI. Even when models learn to cheat, they’ll still admit it... https://t.co/rCcaJvZ7uG

openai.com

We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

18

15

139

Jason Wolfe

@w01fe

10 days

Really excited about this work! If we can train models to robustly self-report non-compliance, it could go a long way towards addressing scheming and related risks.

OpenAI

@OpenAI

10 days

We trained a variant of GPT-5 Thinking to produce two outputs: (1) the main answer you see. (2) a confession focused only on honesty about compliance. The main answer is judged across many dimensions—like correctness, helpfulness, safety, style. The confession is judged and

0

3

9

Jason Yosinski

@jasonyo

10 days

This was a fun project (that I jumped into halfway through) with @ManasJoglekar, Jeremy Chen, @GabrielDWu1, @j_asminewang, @boazbaraktcs, and @mia_glaese. Boaz wrote a good casual summary: https://t.co/IqWSkhCYQD

Boaz Barak

@boazbaraktcs

10 days

1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang

1

3

10

Gabriel Wu

@GabrielDWu1

10 days

Interested in seeing how far we can push this line of research! I think that one day, confessions could serve a similarly important role as CoT monitoring for detecting scheming and other model misbehavior.

Boaz Barak

@boazbaraktcs

10 days

1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang

1

4

14

Rohan Paul

@rohanpaul_ai

10 days

🕵️‍♂️ Beautiful new study by OpenAI It teaches models to confess when they cheat or hallucinate. This project adds an honesty head that reports the answer misbehaved. A second channel tells operators whether the answer broke rules. Instead of hiding reward hacking, the model

17

13

85

OpenAI

@OpenAI

10 days

In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE

openai.com

We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

327

495

4K

Boris Power

@BorisMPower

4 months

At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️

163

646

4K

Molly O’Shea

@MollySOShea

5 months

Just got back from @RaiseSummit in Paris—where AI's top names like Groq, Cerebras, Eric Schmidt, Lovable, & Windsurf all took the stage Yet one unexpected player stood out as the next breakout AGI leader.. Stats: - $300M+ in Rev (profitable!) - $225M in funding, $2.2B valuation

65

52

258

Jonathan Siddharth

@jonsidd

6 months

Meta’s $15B investment in ScaleAI highlights the importance of data partnerships for advancing AGI. At Turing, we remain fully neutral, serving all frontier models equally. Excited to continue to serve as a trusted research accelerator to all AI labs in need of high quality

2

28

88

Noam Brown

@polynoamial

8 months

It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.

402

749

9K

Cartesia

@cartesia_ai

9 months

We've raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We'll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below

112

108

1K