- The new large model uses DeepseekV2 architecture. 0 mention on the page lol.
It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".
---
vllm/model_executor/models/mistral_large_3.py
```
from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM
class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):
```
"Science has always thrived on openness and shared discovery." btw
Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.
- https://saucenao.blogspot.com/2021/04/recent-events.html
Mildly related incident where a Canadian child protection agency uploads csam onto a reverse image search engine and then reports the site for the temporarily stored images.
- > I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.
For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.
It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.
I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.
- Groq does quantise. Look at this benchmark from moonshotai for K2 where they compare their official implementation to third party providers.
https://github.com/MoonshotAI/K2-Vendor-Verifier
It's one of the lowest rated on that table.
- The french comic pirate scene has an interesting rule where they keep a ~6 month time lag on what they release. The scene is small enough that the rule generally works.
It's a really good trade-off. I would never have gotten into these comics without piracy but now if something catches my eye, I don't mind buying on release (and stripping the DRM for personal use).
Most of my downloading is closer to collecting/hoarding/cataloguing behaviour but if I fully read something I enjoy, I'll support the author in some way.
- Merge comments? https://www.hackerneue.com/item?id=44331150
I'm really getting bored of Anthropic's whole song and dance with 'alignment'. Krackers in the other thread explains it in better words.
- Agree completely. When I read the Gemma 3 paper (https://arxiv.org/html/2503.19786v1) and saw an entire section dedicated to measuring and reducing the memorization rate I was annoyed. How does this benefit end users at all?
I want the language model I'm using to have knowledge of cultural artifacts. Gemma 3 27B was useless at a question related to grouping Berserk characters by potential baldurs gate 3 classes; Claude did fine. The methods used to reduce memorisation rate probably also deteriorate performance in some other ways that don't show up on benchmarks.
- > Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.
Extremely cringe behaviour. Raw CoTs are super useful for debugging errors in data extraction pipelines.
After Deepseek R1 I had hope that other companies would be more open about these things.
- The linked blog is down. But agreed, I would especially like to see this particular thing fixed.
> Property ordering
> When you're working with JSON schemas in the Gemini API, the order of properties is important. By default, the API orders properties alphabetically and does not preserve the order in which the properties are defined (although the Google Gen Al SDKs may preserve this order). If you're providing examples to the model with a schema configured, and the property ordering of the examples is not consistent with the property ordering of the schema, the output could be rambling or unexpected.
- Actually you can't do "system" roles at all with OpenAI models now.
You can use the "developer" role which is above the "user" role but below "platform" in the hierarchy.
https://cdn.openai.com/spec/model-spec-2024-05-08.html#follo...
I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.
OpenAI has more generous limits on the schemas and clearer docs. https://platform.openai.com/docs/guides/structured-outputs#s....
You guys closed this issue for no reason: https://github.com/googleapis/python-genai/issues/660
Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.