In my experience, mostly using LoRA finetunes with OpenAI, it is important to include a/the prompt in your fine tuning dataset. It is _also_ important to include that prompt at inference time. I have not seen any evidence of being able to save on input tokens by using a LoRA, at least not with the chat models.
This was a bit less true of older models like davinci - you really could basically just send them data (not sure how they were fine tuned under the hood). However, these models were less powerful in general.
For `gpt-3.5-turbo`, which definitely uses LoRA or something very similar, I have not witnesses this behaviour.
Another way to look at this is that when a LoRA is added to a sufficiently large foundation model like GPT-3, it doesn't really lose its "GPT-ness"
This was a bit less true of older models like davinci - you really could basically just send them data (not sure how they were fine tuned under the hood). However, these models were less powerful in general.
For `gpt-3.5-turbo`, which definitely uses LoRA or something very similar, I have not witnesses this behaviour.
Another way to look at this is that when a LoRA is added to a sufficiently large foundation model like GPT-3, it doesn't really lose its "GPT-ness"