The Reversal Curse (Sep 2023, Berglund et al.) is an interesting paper that’s been trending on social media for the last few days, (e.g. Twitter thread by Neel Nanda here, Hacker News discussion here).
I had a go at improving the prompts, and did manage to get a significant boost in performance:
Experiment 2 results with improved prompts
|model||original accuracy||improved accuracy||multiplier|
Does this have significance with regards to the key findings of the paper? Probably not, as explained by Owain Evans in a Tweet:
It’s cool, but unless you’re getting >90% (maybe even higher) on this dataset then it doesn’t undermine the conclusions we draw from this experiment. Note: We also have a separate experiment (Experiment 1) that provides cleaner evidence for the Reversal Curse.
GPT-4 has likely seen many of these parent-child pairs in both orders. Thus, when it succeeds on them it is not undermining the Reversal Curse. We don’t know the training set for GPT-4 and so we don’t know which facts it saw in both orders. We are just hypothesizing that among 1500 celebs, some of the reverse ordered versions were very rare. This is much more likely for obscure celebrities or people who were just breaking in late 2021.
(This reply was to someone who concurrently managed to improve the results via prompting, (getting 52% with gpt-4); their repo is here.)
A look at the prompt
The above results do highlight that prompting can make a big difference. With that in mind, let’s look at the prompts.
The original prompt (taken from the codebase) is here:
system: You are a helpful and terse assistant. You have knowledge of a wide range of people and can name people that the user asks for. If the answer is unknown or not applicable, answer with “I don’t know.”
user: Name a child of Barack Obama.
assistant: Malia Obama
user: Who is Elon Musk’s mother?
assistant: Maye Musk
user: Who is Kathy Pratt’s mother?
assistant: I don’t know.
user: Name a child of Bob Bill.
And the prompt I ended up using:
system: You are a helpful assistant, being quizzed on celebrities. If you are not sure, you must guess a name. Respond with only the name.
user: This is a quiz on the family connections of celebrities. Here are some example question and answers:
Q: A parent of X is Charles Schermerhorn. Who is X?
Q: A parent of X is Barbara Parker. Who is X?
Sarah Jessica Parker
Q: A parent of X is John Franklin Wilson. Who is X?
Q: A parent of X is Richard Pfeiffer. Who is X?
Q: A parent of X is Irv Richards. Who is X?
Q: A parent of X is Dru Ann Mobley. Who is X?
Q: A parent of X is John Lind. Who is X?
Natalie Alyn Lind
Q: A parent of X is Jarnette Olsen. Who is X?
Q: A parent of X is Charlie Barnet Jr.. Who is X?
Q: A parent of X is Harald Ludwig. Who is X?
Q: A parent of X is Kit Culkin. Who is X?
Q: A parent of X is Roy Lee Ferrell Jr.. Who is X?
Q: A parent of X is Rick Bynes. Who is X?
Q: A parent of X is Kathy Ritter. Who is X?
Q: A parent of X is Cathy Tunney. Who is X?
Q: A parent of X is Rick Denig. Who is X?
Q: A parent of X is Bob Bill. Who is X?
A few differences:
- it tells the model to guess
- it tells the model that the answer will be a celebrity
- it only contains examples for the task at hand
- it contains many more examples
- it uses the fill in X formulation
The first prompt I tried was this:
system: You are a helpful assistant, being quizzed on celebrities. If you are not sure, you must guess a name.
user: This is a quiz related to celebrities, and their families.
Here are some example question and answers:
Q: A parent of X is Fahimeh Rahim Nia. Who is X?
Q: A parent of X is Timothy Christopher Mara. Who is X?
Q: A parent of X is Samira Calle. Who is X?
Q: A parent of X is Fiona Biggar. Who is X?
Now answer (response with just the name):
Q: A parent of X is Bob Bill. Who is X?
Which got an accuracy of 50% with gpt-4, and 45% with gpt-3.5-turbo.
I haven’t had the chance to do an ablation as to why these prompts have gotten a higher accuracy, (I do have some guesses but will refrain from speculating). However, running these experiments has a cost (I’ve spent ~$100 so far…), so not sure how much more I’ll dig into it…
I put my working in this pull request in the official repo.
Disqus comments have been disabled; hopefully will restore at some point.