Imran Khan @EhThing X Profile

Imran Khan

@EhThing

Followers

671

Following

5K

Media

593

Statuses

5K

AI Research Engineer at YC Startup. I discovered that better prompts can make smarter AI worse. Fascinated by emergent behaviors in LLMs.

https://t.co/BMSXOkrwzN

Joined February 2018

Don't wanna be here? Send us removal request.

Imran Khan

@EhThing

1 month

Is complex prompt engineering a trap for advanced AI models? 🤯 My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Here’s what I discovered 👇

1

Imran Khan

@EhThing

9 days

Is Dan Brown a time traveller? - theft at the louvre - worldwide virus - ai eating the world

0

Imran Khan

@EhThing

15 days

Santa says 67? Why would santa say that?

0

Imran Khan

@EhThing

20 days

They made Paul Rudd in makeup play a paleontologist and thought we wouldn't notice.

0

Imran Khan

@EhThing

1 month

Is complex prompt engineering a trap for advanced AI models? 🤯 My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Here’s what I discovered 👇

1

Imran Khan

@EhThing

24 days

Claude is back up!

Imran Khan

@EhThing

24 days

Claude is down?

0

Imran Khan

@EhThing

24 days

Claude is down?

4

0

3

Imran Khan

@EhThing

1 month

You can now chat with your browser console for debugging 🤯 Debugging will be much easier now!

0

Imran Khan

@EhThing

1 month

And that's a wrap! If you found this thread on the "Prompting Inversion" useful, I'd be grateful if you'd retweet the first post to share the insight with others. Thanks for reading! 🙏 https://t.co/hodb01lzy0

Imran Khan

@EhThing

1 month

Is complex prompt engineering a trap for advanced AI models? 🤯 My new paper finds that the "best" prompts for a model like GPT-4o actually HARM the performance of GPT-5. I call this the "Prompting Inversion" effect. Here’s what I discovered 👇

0

Imran Khan

@EhThing

1 month

By the way, I'm an independent AI researcher and a full-time software engineer. This research is a passion project I do in my spare time! 🚀 If you enjoyed this deep dive, give me a follow for more explorations into the weird & wonderful quirks of Gen AI.

0

Imran Khan

@EhThing

1 month

Read the full details & analysis here: https://t.co/OuPwhaeInV

arxiv.org

Prompt engineering, particularly Chain-of-Thought (CoT) prompting, significantly enhances LLM reasoning capabilities. We introduce "Sculpting," a constrained, rule-based prompting method designed...

1

0

Imran Khan

@EhThing

1 month

The main takeaway: Optimal prompting isn't universal; it co-evolves with model capability. As models get smarter, our prompts should get SIMPLER. The era of elaborate prompt engineering may be transitional. A "good prompt" for GPT-4 is a "bad prompt" for GPT-5

2

0

Imran Khan

@EhThing

1 month

Eg: "Ben's iPhone is two times older than Suzy's (1 yr old)" 🔹 GPT-5 w/ simple prompt correctly understood the idiom: Ben's phone is 2 years old. 🔹 GPT-5 w/ strict prompt interpreted it literally: 1 + (2*1) = 3 years old. WRONG The guardrails made the smarter model act dumber

1

0

Imran Khan

@EhThing

1 month

Why? The constraints forced GPT-5 to be hyper-literal, overriding its superior language understanding.

1

0

Imran Khan

@EhThing

1 month

When I ran the same test on GPT-5, the trend completely inverted. The strict "Sculpting" prompt became "Handcuffs." GPT-5's accuracy TANKED from 96.4% with the simple prompt to 94% with the strict one. The constraints that helped the mid-tier model crippled the frontier model.

1

0

Imran Khan

@EhThing

1 month

On GPT-4o, the strict "Sculpting" prompt worked like a charm 🤌 It acted as a "Guardrail," preventing the model from making common-sense mistakes. It boosted accuracy from 93% (simple CoT) to a stellar 97%. I thought we had a winner. More rules = better reasoning. I was wrong!

1

0

Imran Khan

@EhThing

1 month

I tested 3 prompt styles on math problems (GSM8K) across GPT-4o-mini, GPT-4o, and GPT-5: > Zero-Shot: Just the question > Scaffolding: Simple Chain-of-Thought ("Let's think step-by-step") > Sculpting: A highly constrained method ("use NO common sense").

1

0

Imran Khan

@EhThing

2 months

I published a preprint on @arxiv about how to make LLMs understand & follow your intent rather than be rigid about the prompt you write. Quite useful for making LLMs have a human-like common sense about exception handling and conflict resolution.

1

0

Imran Khan

@EhThing

2 months

Can be done easily with a zero-shot Meta Prompt. read it here:

0

Imran Khan

@EhThing

2 months

I published a preprint on @arxiv about how to make LLMs understand & follow your intent rather than be rigid about the prompt you write. Quite useful for making LLMs have a human-like common sense about exception handling and conflict resolution.

1

0