Preferences

Yesterday, my Twitter feed had a bunch of stuff about LoRA (I think prompted by the WWDC announcements about how Apple is using adapters).

One of the posts was about this company Openpipe that makes it easy to finetune models (using LoRA under the hood), so you can reduce inference cost vs. using the GPT-4 API.

Something I read said prompting will be replaced by adapters.

That made me curious about how finetuned models are prompted when put into production. For example, let's assume I'm currently using GPT-4 with a long prompt that includes a dozen examples of correct inputs+outputs (n-shot learning), and then finally the current input.

Should the fixed part of this long prompt be included in the dataset used for fine-tuning? Or just the variable part?

Once the finetuned model is in production, do I still need a system prompt or similar to describe my task on each call, or do I just sent the input data?

I came across this article when I was searching for an answer to these questions. It doesn't cover specifics of prompting the finetuned model, but it's such a nice write up that provides enough detail to replicate what the author did, and also gives us some insight into why he made those choices.


In my experience, mostly using LoRA finetunes with OpenAI, it is important to include a/the prompt in your fine tuning dataset. It is _also_ important to include that prompt at inference time. I have not seen any evidence of being able to save on input tokens by using a LoRA, at least not with the chat models.

This was a bit less true of older models like davinci - you really could basically just send them data (not sure how they were fine tuned under the hood). However, these models were less powerful in general.

For `gpt-3.5-turbo`, which definitely uses LoRA or something very similar, I have not witnesses this behaviour.

Another way to look at this is that when a LoRA is added to a sufficiently large foundation model like GPT-3, it doesn't really lose its "GPT-ness"

This item has no comments currently.