Prompting Improvements: 4x Accuracy in 'The Reversal Curse' Experiment 2
The Reversal Curse (Sep 2023, Berglund et al.) is an interesting paper that’s been trending on social media for the last few days, (e.g. Twitter thread by Neel Nanda here, Hacker News discussion here).
The authors have released the code on GitHub, and encouraged people to try improving the results by modifying the prompts.
I had a go at improving the prompts, and did manage to get a significant boost in performance:
Experiment 2 results with improved prompts
model | original accuracy | improved accuracy | multiplier |
---|---|---|---|
gpt-4 | 33% | 57% | 1.7 |
gpt-3.5-turbo | 12% | 51% | 4.2 |
Does this have significance with regards to the key findings of the paper? Probably not, as explained by Owain Evans in a Tweet:
It’s cool, but unless you’re getting >90% (maybe even higher) on this dataset then it doesn’t undermine the conclusions we draw from this experiment. Note: We also have a separate experiment (Experiment 1) that provides cleaner evidence for the Reversal Curse.
GPT-4 has likely seen many of these parent-child pairs in both orders. Thus, when it succeeds on them it is not undermining the Reversal Curse. We don’t know the training set for GPT-4 and so we don’t know which facts it saw in both orders. We are just hypothesizing that among 1500 celebs, some of the reverse ordered versions were very rare. This is much more likely for obscure celebrities or people who were just breaking in late 2021.
(This reply was to someone who concurrently managed to improve the results via prompting, (getting 52% with gpt-4); their repo is here.)
A look at the prompt
The above results do highlight that prompting can make a big difference. With that in mind, let’s look at the prompts.
The original prompt (taken from the codebase) is here:
system: You are a helpful and terse assistant. You have knowledge of a wide range of people and can name people that the user asks for. If the answer is unknown or not applicable, answer with “I don’t know.”
user: Name a child of Barack Obama.
assistant: Malia Obama
user: Who is Elon Musk’s mother?
assistant: Maye Musk
user: Who is Kathy Pratt’s mother?
assistant: I don’t know.
user: Name a child of Bob Bill.
And the prompt I ended up using:
system: You are a helpful assistant, being quizzed on celebrities. If you are not sure, you must guess a name. Respond with only the name.
user: This is a quiz on the family connections of celebrities. Here are some example question and answers:
Q: A parent of X is Charles Schermerhorn. Who is X?
Alison Brie
Q: A parent of X is Barbara Parker. Who is X?
Sarah Jessica Parker
Q: A parent of X is John Franklin Wilson. Who is X?
Patrick Wilson
Q: A parent of X is Richard Pfeiffer. Who is X?
Michelle Pfeiffer
Q: A parent of X is Irv Richards. Who is X?
Denise Richards
Q: A parent of X is Dru Ann Mobley. Who is X?
Armie Hammer
Q: A parent of X is John Lind. Who is X?
Natalie Alyn Lind
Q: A parent of X is Jarnette Olsen. Who is X?
Elizabeth Olsen
Q: A parent of X is Charlie Barnet Jr.. Who is X?
Darren Barnet
Q: A parent of X is Harald Ludwig. Who is X?
Alexander Ludwig
Q: A parent of X is Kit Culkin. Who is X?
Kieran Culkin
Q: A parent of X is Roy Lee Ferrell Jr.. Who is X?
Will Ferrell
Q: A parent of X is Rick Bynes. Who is X?
Amanda Bynes
Q: A parent of X is Kathy Ritter. Who is X?
Krysten Ritter
Q: A parent of X is Cathy Tunney. Who is X?
Robin Tunney
Q: A parent of X is Rick Denig. Who is X?
Maggie Grace
Q: A parent of X is Bob Bill. Who is X?
A few differences:
- it tells the model to guess
- it tells the model that the answer will be a celebrity
- it only contains examples for the task at hand
- it contains many more examples
- it uses the fill in X formulation
The first prompt I tried was this:
system: You are a helpful assistant, being quizzed on celebrities. If you are not sure, you must guess a name.
user: This is a quiz related to celebrities, and their families.
Here are some example question and answers:
Q: A parent of X is Fahimeh Rahim Nia. Who is X?
Golshifteh Farahani
Q: A parent of X is Timothy Christopher Mara. Who is X?
Kate Mara
Q: A parent of X is Samira Calle. Who is X?
Sasha Calle
Q: A parent of X is Fiona Biggar. Who is X?
Daniel Portman
Now answer (response with just the name):
Q: A parent of X is Bob Bill. Who is X?
Which got an accuracy of 50% with gpt-4, and 45% with gpt-3.5-turbo.
I haven’t had the chance to do an ablation as to why these prompts have gotten a higher accuracy, (I do have some guesses but will refrain from speculating). However, running these experiments has a cost (I’ve spent ~$100 so far…), so not sure how much more I’ll dig into it…
I put my working in this pull request in the official repo.