Comment by roflcopter69

roflcopter69 2 days ago parent

I'd be really interested in what you mean. Are the any studies that quantify this difference in model performance when using JSON or XML? What could be a good intuition for why there might be a big difference? If XML is better than JSON for LLMs, why isn't everyone and the grandma recommending me to use XML instead of JSON? Why is Google Gemini API offering structured output only with JSON schema instead of XML schema?

simonw 2 days ago

I don't know if the XML is better than JSON thing still holds with this year's frontier models, but it was definitely a thing last year. Here's Anthropic's documentation about that: https://docs.claude.com/en/docs/build-with-claude/prompt-eng...

Note that they don't actually suggest that the XML needs to be VALID!

My guess was that JSON requires more characters to be escaped than XML-ish syntax does, plus matching opening and closing tags makes it a little easier for the LLM not to lose track of which string corresponds to which key.

bird0861 2 days ago

the Qwen team is still all in on XML and they make a good case for it

roflcopter69 OP 1 day ago

Can you please provide a source? I'd love to know their exact reasoning and/or evidence that XML is the way to go.

samuelknight 2 days ago

(1) JSON requires lots of escape characters that mangle the strings + hex escapes and (2) it's much easier for model attention to track when a semantic block begins and ends when it's wrapped by the name of that section

...

</instructions>

can be much easier than

{

"instructions": "..\n...\n"

}

especially when there are newlines, quotes and unicode

roflcopter69 OP 2 days ago

Thanks for the reply, that part about the models attention is pretty interesting!

I would suspect that a single attention layer won't be able to figure out to which token a token for an opening bracket should attend the most to. Think of {"x": {y: 1}} so with only one layer of attention, can the token for the first opening bracket successfully attend to exactly the matching closing bracket?

I wonder if RNNs work better with JSON or XML. Or maybe they are just fine with both of them because a RNN can have some stack-like internal state that can match brackets?

Probably, it would be a really cool research direction to measure how well Transformer-Mamba hybrid models like Jamba perform on structured input/output formats like JSON and XML and compare them. For the LLM era, I could only find papers that do this evaluation with transformer-based LLMs. Damn, I'd love to work at a place that does this kind of research, but guess I'm stuck with my current boring job now :D Born to do cutting-edge research, forced to write CRUD apps with some "AI sprinkled in". Anyone hiring here?

This item has no comments currently.