Using MSAs might be a local optimum. ESM showed good performance on some protein problems without MSAs. MSAs offer a nice inductive bias and better average performance. However, the cost is doing poorly on proteins where MSAs are not accurate. These include B and T cell receptors, which are clinically very relevant.
Isomorphic Labs, Oxford, MRC, and others have started the OpenBind Consortium (https://openbind.uk) to generate large-scale structure and affinity data. I believe that once more data is available, MSAs will be less relevant as model inputs. They are "too linear".
Using MSAs might be a local optimum. ESM showed good performance on some protein problems without MSAs. MSAs offer a nice inductive bias and better average performance. However, the cost is doing poorly on proteins where MSAs are not accurate. These include B and T cell receptors, which are clinically very relevant.
Isomorphic Labs, Oxford, MRC, and others have started the OpenBind Consortium (https://openbind.uk) to generate large-scale structure and affinity data. I believe that once more data is available, MSAs will be less relevant as model inputs. They are "too linear".