Comment by ojosilva - Hacker Neue

ojosilva Nov 8, 2025 parent

Here's a customer of the $200 max plan for 2 months. I fell in love with the Qwen3 Coder 480B model, Q3C, that was fast, twice the speed of GLM. GLM 4.6 is just meh, I mean, way faster than competitors, and practically at Sonnet 4.x level in coding and tool use, but not a life-changing difference.

Yes, Qwen3 made more mistakes than GLM, around 15% more in my quick throwaway evals, but it was a more professional model overall, more polished in some aspects, better with international languages, and being non-reasoning, ideal for a lot of tasks through the API that could be ran instantaneously. I think the Qwen line of models is a more consistent offering, with other versions of the model for 32B and VL, now a 80B one, etc. I guess the problem was that Qwen Max was closed source, signalling that Qwen may not have a way forward for Cerebras to evolve. GLM 4.6 covers precisely that hole. Not that Cerebras is a model provider of any kind, their service levels are "buggy" (right now it's been down for 1h and probably won't be fixed until California wakes up at 9am PST). So it does feel like we are not the customers, but the product, a marketing stunt for them to get visibility for their tech.

GLM feels like they (Z.ai) are just distilling whatever they can get into it. GLM switches to Chinese sometimes, or just cuts off. It does have a bit of more "intelligence" than Q3C, but not enough to say it solves the toughest problems. Regardless, for tough nuts to crack I use my Codex Plus plan.

Ex: In one of my evals, it took 15 turns to solve an issue using Cerebras Q3C. I took 12 turns with GLM, but overall GLM takes 2x the time, so instead of doing a full task from zero-to-commit in say 15 minutes, it takes 24 minutes.

In another eval (Next.js CSS editing), my task with Q3C coder was done in 1:30 minutes. GLM 4.6 took 2:24. The same task in Codex took 5:37 minutes, with maybe 1 or 2 turns. Codex DX is that of working unattended: prompt it and go do something else, there's a good chance it will get it right after 0, 1 or 2 nudges. With CC+Cerebras it's a completely different DX, given the speed it feels just like programming, but super-fast. Prompt, read the change, accept (or don't), accept, accept, accept, test it out, accept, prompt, accept, interrupt, prompt, accept, and 1:30 min later we're done.

Like I said I use Claude Code + a proxy (llmux). The coding agent makes a HUGE difference, and CC is hands-down the best agent out there.

andai Nov 8, 2025

Asked GLM-4.6 to introduce itself. "Hello! I'm glad you asked. I'm a large language model, trained by Google. (...)"

It seems to have been fine-tuned on Claude Code interactions as well. Though unfortunately not much of Claude's coding style itself? (I wish!)

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous