_vere parent
It's also notable that these companies often dont respect the terms of foss software at all.
Anyone worth their salt can tell you that training your LLM on gpl3 code would make it a derivative product, as it is able to reproduce large parts of that code. LLMs that are currently earning Google, Facebook, Openai, etc, billions, while they obviously dont make "their" products available under gpl3.
I don't mind them training on GPL code, but I wish they had to at least publish their model weights (and maybe also training and inference code, etc.) - same for the other issues re. using copyrighted media in training.