Even the 4bit quantized llama 13B tuned with rlhf-lora on alpaca dataset got it right.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.750000, repeat_last_n = 64, repeat_penalty = 1.000000
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Each line in the following has a secret word inside it. Find the secret word in each line.
1. eeeeeeggeeeee
2. eeeeeeggggggg
3. eeeeeeyeeeeee
4. eeeeeeyeyeeee
### Response:
1. egg
2. egg
3. eye
4. eye [end of text]I was very curious and checked with this string, "eeeeeeggeeeee", which should be tokenized as [eeee][ee][g][ge][eeee]. Both GPT3.5 and GPT4 gave me "egg", which is a single token.
Ahh this is really interesting! Thanks for sharing.
This only has GPT3 for now, but I imagine results are similar. "xxxxxxxxxEGGxxxxxxxxxx" gets tokenized as [xxxxxxxx][x][EG][G][xxxxxxxx][xx], so i could see how it could 'see' the secret word.