I forget where I saw this comparison so I can’t link to it, but the last few years in AI are like waking up and finding dogs can talk: while some complain they’re not the worlds greatest orators, I find it amazing they can string a few genuinely coherent sentences together and maintain a contextual thread over multiple responses even half the time.
I agree with the sentiment, but to continue your analogy, if OpenAI is using people to improve the answers to specific questions, it is a bit like learning that Cicero, Lincoln and Churchill were merely reading the work of speechwriters.
There is an argument that it does not matter how GPT-3 gets to its answers - after all, for a long time, the main approach to AI was for people to write a lot of bespoke rules in an attempt to endow a computer with common sense and knowledge, so GPT-3 + instructGPT might be described as a hybrid of machine learning and the old approach.
If OpenAI wishes to pursue that path, it is fine by me (as if my opinion matters!) but, because the perception of GPT-3 depends very strongly on how its output looks to human readers, it is obviously misleading if some of the most impressive replies were largely the result of specific human intervention. The issue is transparency: I would just like to know, when I read a reply, if this was the case, and it would not help OpenAI for it to ignore the call, in this article, for it to be clear about this.
There is another argument that says that, given how GPT-3 works, it is unreasonable to expect it to give good answers in these cases - but that's the point! It looks really impressive when GPT-3 apparently does so, but not if they were effectively hard-coded.