An open weight model addresses the second part of THIS, but not the first. However, even an open weight model with all of the training data available doesn't fix the first problem. Even if you somehow got access to enough hardware to train your own GPT-5 based on the published data, you still couldn't meaningfully fix an issue you have with it, not even if you hired Ilya Sutskever and Yann LeCun to do it for you: these are black boxes that no one can actually understand at the level of a program or device.
I have also seen people train "jailbreaks" of popular open source LLMs (e.g. Google Gemma) that remove the condescending ethical guidelines and just let you talk to the thing normally.
So all in all I am skeptical of the claim that there would be no value in having access to the training data. Clearly there is some ability to steer the direction of the output these models produce.
https://www.anthropic.com/news/golden-gate-claude
https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-a...
They probably can't give you the training set as it would amount to publication of infringing content. Where would you store it, and what would you do with it anyway?
It's fine for models to have open-weights and closed data. It's only barely fitting the opensource model IMHO though.