From https://www.hackerneue.com/item?id=40859434 :
> E.g promptfoo and chainforge have multi-LLM workflows.
> Promptfoo has a YAML configuration for prompts, providers,: https://www.promptfoo.dev/docs/configuration/guide/
openai/evals//docs/build-eval.md: https://github.com/openai/evals/blob/main/docs/build-eval.md
From https://www.hackerneue.com/item?id=45267271 ;
> API facades like OpenLLM and model routers like OpenRouter have standard interfaces for many or most LLM inputs and outputs. Tools like Promptfoo, ChainForge, and LocalAI also all have abstractions over many models.
> What are the open standards for representing LLM inputs, and outputs?
> W3C PROV has prov:Entity, prov:Activity, and prov:Agent for modeling AI provenance: who or what did what when.
> LLM evals could be represented in W3C EARL Evaluation and Reporting Language
"Can Large Language Models Emulate Judicial Decision-Making? [Paper]" https://www.hackerneue.com/item?id=42927611
"California governor signs AI transparency bill into law" (2025) https://www.hackerneue.com/item?id=45418428 :
Is this the first of its sort?:
> CalCompute
An organization like Artificial Analysis would be a better fit for that kind of investigation: https://artificialanalysis.ai/