4 points Dec 19, 2025 Evaluating Large Language Models in Scientific Discovery 0 comments mpweiher arxiv.org