The same thing is going to happen with all of the human language artifacts present in the agentic coding universe. Role definitions, skills, agentic loop prompts....the specific language, choice of words, sequence, etc really matters and will continue to evolve really rapidly, and there will be benchmarkers, I am sure of it, because quite a lot of orgs will consider their prompt artifacts to be IP.
I have personally found that a very high precision prompt will mean a smaller model on personal hardware will outperform a lazy prompt given to a foundation model. These word calculators are very very (very) sensitive. There will be gradations of quality among those who drive them best.
The best law firms are the best because they hire the best with (legal) language and are able to retain the reputation and pricing of the best. That is the moat. Same will be the case here.
You might get an 80% “good enough” prompt easily but then all the differentiation (moat) is in that 20% but that 20% is tied to the model idiosyncrasies, making the moat fragile and volatile.