Preferences

surfingdino parent
> structured tool output

Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.


sanxiyn
No? With structured outputs you get valid JSON 100% of the time. This is a non-problem now. (If you understand how it works, it really can't be otherwise.)

https://openai.com/index/introducing-structured-outputs-in-t...

https://platform.openai.com/docs/guides/structured-outputs

OtherShrezzing
From the 2nd link:

> Structured Outputs can still contain mistakes.

The guarantee promised in link 1 is not supported by the documentation in link 2. Structured Output does a _very good_ job, but still sometimes messes up. When you’re trying to parse hundreds of thousands of documents per day, you need a lot of 9s of reliability before you can earnestly say “100% guarantee” of accuracy.

hobofan
Whether it's a non-problem or not very much depends on how much the LLM API providers actually bother to add enforcement server-side.

Anecdotally, I've seen Azure OpenAI services hallucinate tools just last week, when I provided an empty array of tools rather than not providing the tools key at all (silly me!). Up until that point I would have assumed that there are server-side safeguards against that, but now I have to consider spending time on adding client-side checks for all kinds of bugs in that area.

surfingdino OP
Meanwhile https://dev.to/rishabdugar/crafting-structured-json-response... and https://www.boundaryml.com/blog/structured-output-from-llms

You are confusing API response payloads with Structured JSON that we expect to conform to the given Schema. It's carnage that requires defensive coding. Neither OpenAI nor Google are interested in fixing this, because some developers decide to retry until they get valid structured output which means they spend 3x-5x on the calls to the API.

fooster
Have you actually used this stuff at scale? The replies are often not valid.
sanxiyn
Yes I have.
evalstate
Structured Output in this case refers to the output from the MCP Server Tool Call, not the LLM itself.

This item has no comments currently.