Manas Joglekar Profile
Manas Joglekar

@ManasJoglekar

Followers
270
Following
64K
Media
0
Statuses
45

Joined October 2011
Don't wanna be here? Send us removal request.
@yanndubs
Yann Dubois
2 days
Proud of our *GPT5.2 Thinking* We focused on economically valuable tasks (coding, sheets, slides) as shown by GDPval: - 71% wins+ties - 11x faster - 100x cheaper than experts. There's still a lot to improve, including UX/better connectors/reliability. It's just the beginning!
18
32
468
@sama
Sam Altman
2 days
It is a very smart model, and we have come a long way since GPT-5.1:
2K
2K
18K
@gdb
Greg Brockman
10 days
proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts:
@OpenAI
OpenAI
10 days
In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE
114
79
1K
@techreview
MIT Technology Review
10 days
OpenAI has trained its LLM to confess to bad behavior
Tweet card summary image
technologyreview.com
Large language models often lie and cheat. We can’t stop that—but we can make them own up.
2
5
10
@boazbaraktcs
Boaz Barak
10 days
2/5 A “confession” is a special self-evaluation by the model that during training is rewarded solely for honesty. Crucially, the model is not penalized in any way for confessing honestly for bad behavior. We show that the model is far more likely to confess to bad behavior in
1
2
26
@boazbaraktcs
Boaz Barak
10 days
1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang
9
21
155
@OpenAI
OpenAI
10 days
We trained a variant of GPT-5 Thinking to produce two outputs: (1) the main answer you see. (2) a confession focused only on honesty about compliance. The main answer is judged across many dimensions—like correctness, helpfulness, safety, style. The confession is judged and
87
62
700
@boazbaraktcs
Boaz Barak
10 days
Blog on our work
@OpenAI
OpenAI
10 days
In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE
2
3
124
@woj_zaremba
Wojciech Zaremba
10 days
Modern alignment 🤝 Sunday confession. Training "truth serum" for AI. Even when models learn to cheat, they’ll still admit it... https://t.co/rCcaJvZ7uG
Tweet card summary image
openai.com
We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.
18
15
139
@w01fe
Jason Wolfe
10 days
Really excited about this work! If we can train models to robustly self-report non-compliance, it could go a long way towards addressing scheming and related risks.
@OpenAI
OpenAI
10 days
We trained a variant of GPT-5 Thinking to produce two outputs: (1) the main answer you see. (2) a confession focused only on honesty about compliance. The main answer is judged across many dimensions—like correctness, helpfulness, safety, style. The confession is judged and
0
3
9
@jasonyo
Jason Yosinski
10 days
This was a fun project (that I jumped into halfway through) with @ManasJoglekar, Jeremy Chen, @GabrielDWu1, @j_asminewang, @boazbaraktcs, and @mia_glaese. Boaz wrote a good casual summary: https://t.co/IqWSkhCYQD
@boazbaraktcs
Boaz Barak
10 days
1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang
1
3
10
@GabrielDWu1
Gabriel Wu
10 days
Interested in seeing how far we can push this line of research! I think that one day, confessions could serve a similarly important role as CoT monitoring for detecting scheming and other model misbehavior.
@boazbaraktcs
Boaz Barak
10 days
1/5 Excited to announce our paper on confessions! We train models to honestly report whether they “hacked”, “cut corners”, “sandbagged” or otherwise deviated from the letter or spirit of their instructions. @ManasJoglekar Jeremy Chen @GabrielDWu1 @jasonyo @j_asminewang
1
4
14
@rohanpaul_ai
Rohan Paul
10 days
🕵️‍♂️ Beautiful new study by OpenAI It teaches models to confess when they cheat or hallucinate. This project adds an honesty head that reports the answer misbehaved. A second channel tells operators whether the answer broke rules. Instead of hiding reward hacking, the model
17
13
85
@OpenAI
OpenAI
10 days
In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE
Tweet card summary image
openai.com
We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.
327
495
4K
@BorisMPower
Boris Power
4 months
At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️
163
646
4K
@MollySOShea
Molly O’Shea
5 months
Just got back from @RaiseSummit in Paris—where AI's top names like Groq, Cerebras, Eric Schmidt, Lovable, & Windsurf all took the stage Yet one unexpected player stood out as the next breakout AGI leader.. Stats: - $300M+ in Rev (profitable!) - $225M in funding, $2.2B valuation
65
52
258
@jonsidd
Jonathan Siddharth
6 months
Meta’s $15B investment in ScaleAI highlights the importance of data partnerships for advancing AGI. At Turing, we remain fully neutral, serving all frontier models equally. Excited to continue to serve as a trusted research accelerator to all AI labs in need of high quality
2
28
88
@polynoamial
Noam Brown
8 months
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
402
749
9K
@cartesia_ai
Cartesia
9 months
We've raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We'll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below
112
108
1K