Imran Khan
@EhThing
Followers
671
Following
5K
Media
593
Statuses
5K
AI Research Engineer at YC Startup. I discovered that better prompts can make smarter AI worse. Fascinated by emergent behaviors in LLMs.
Joined February 2018
Is complex prompt engineering a trap for advanced AI models? ๐คฏ My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Hereโs what I discovered ๐
1
1
1
Is Dan Brown a time traveller? - theft at the louvre - worldwide virus - ai eating the world
0
0
0
They made Paul Rudd in makeup play a paleontologist and thought we wouldn't notice.
0
0
0
Is complex prompt engineering a trap for advanced AI models? ๐คฏ My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Hereโs what I discovered ๐
1
1
1
You can now chat with your browser console for debugging ๐คฏ Debugging will be much easier now!
0
0
0
And that's a wrap! If you found this thread on the "Prompting Inversion" useful, I'd be grateful if you'd retweet the first post to share the insight with others. Thanks for reading! ๐ https://t.co/hodb01lzy0
Is complex prompt engineering a trap for advanced AI models? ๐คฏ My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Hereโs what I discovered ๐
0
0
0
By the way, I'm an independent AI researcher and a full-time software engineer. This research is a passion project I do in my spare time! ๐ If you enjoyed this deep dive, give me a follow for more explorations into the weird & wonderful quirks of Gen AI.
0
0
0
The main takeaway: Optimal prompting isn't universal; it co-evolves with model capability. As models get smarter, our prompts should get SIMPLER. The era of elaborate prompt engineering may be transitional. A "good prompt" for GPT-4 is a "bad prompt" for GPT-5
2
0
0
Eg: "Ben's iPhone is two times older than Suzy's (1 yr old)" ๐น GPT-5 w/ simple prompt correctly understood the idiom: Ben's phone is 2 years old. ๐น GPT-5 w/ strict prompt interpreted it literally: 1 + (2*1) = 3 years old. WRONG The guardrails made the smarter model act dumber
1
0
0
Why? The constraints forced GPT-5 to be hyper-literal, overriding its superior language understanding.
1
0
0
When I ran the same test on GPT-5, the trend completely inverted. The strict "Sculpting" prompt became "Handcuffs." GPT-5's accuracy TANKED from 96.4% with the simple prompt to 94% with the strict one. The constraints that helped the mid-tier model crippled the frontier model.
1
0
0
On GPT-4o, the strict "Sculpting" prompt worked like a charm ๐ค It acted as a "Guardrail," preventing the model from making common-sense mistakes. It boosted accuracy from 93% (simple CoT) to a stellar 97%. I thought we had a winner. More rules = better reasoning. I was wrong!
1
0
0
I tested 3 prompt styles on math problems (GSM8K) across GPT-4o-mini, GPT-4o, and GPT-5: > Zero-Shot: Just the question > Scaffolding: Simple Chain-of-Thought ("Let's think step-by-step") > Sculpting: A highly constrained method ("use NO common sense").
1
0
0