Instead of finding the perfect prompt for an LLM (let's think step by step), you can ask LLMs to critique their outputs and immediately fix their own mistakes. Here's a fun example:
@awjuliani
h/t
@avisingh599
for pointing me out the Reflexion paper, which gets at this idea. It's absolutely wild to me that LLMs are general enough that they can critique their own outputs in a sensible way
@awjuliani
@avisingh599
Interestingly enough, GPT-3.5 is incapable of such self-critique, at least for this assignment. It seems to be an emergent capability only present in GPT-4
@awjuliani
@avisingh599
The implications are substantial. Instead of clever "prefix prompt engineering", we can now consider a "postfix prompt engineering", which encourages LLMs to find corrections and inconsistencies within prior generated solutions.
@awjuliani
@avisingh599
After any generated output, just append "did the generated output do what the user asked?" and the LLM becomes a "minimal policy improvement operator" for itself
@awjuliani
@avisingh599
Maybe it is possible to apply a critique to the critique recursively, i.e. append "is the critique logically consistent with the original request?" GPT-3.5 seems to be rather biased towards self-congratulatory optimism
Wrote a quick blog post about it here.
Would be curious to see what examples it can verify & correct, and what examples it can only verify, and what tasks it fails to verify
@ericjang11
You should try asking it what prompt it would create to get itself (or I’ve found GPT3 believes that only GPT2 exists so I use that when I do this) to do what you want.
Then you show it the results and share your critique and ask it how to change the prompt to improve the output