"Robustness of Model-Graded Evaluations and Automated Interpretability" (2023) https://www.lesswrong.com/posts/ZbjyCuqpwCMMND4fv/robustness... :
> The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.
From https://www.hackerneue.com/item?id=37451534 : add'l benchmarks: TheoremQA, Legalbench
This item has no comments currently.
It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled.
Please enable JavaScript to use this site (or just go read Hacker News).
"Robustness of Model-Graded Evaluations and Automated Interpretability" (2023) https://www.lesswrong.com/posts/ZbjyCuqpwCMMND4fv/robustness... :
> The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.
From https://www.hackerneue.com/item?id=37451534 : add'l benchmarks: TheoremQA, Legalbench