Preferences

msp26
Joined 841 karma

  1. Hi if the Gemini API team is reading this can you please be more transparent about 'The specified schema produces a constraint that has too many states for serving. ...' when using Structured Outputs.

    I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.

    OpenAI has more generous limits on the schemas and clearer docs. https://platform.openai.com/docs/guides/structured-outputs#s....

    You guys closed this issue for no reason: https://github.com/googleapis/python-genai/issues/660

    Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.

  2. The new large model uses DeepseekV2 architecture. 0 mention on the page lol.

    It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".

    ---

    vllm/model_executor/models/mistral_large_3.py

    ```

    from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM

    class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):

    ```

    "Science has always thrived on openness and shared discovery." btw

    Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.

  3. K2 Thinking has immaculate vibes. Minimal sycophancy and a pleasant writing style while being occasionally funny.

    If it had vision and was better on long context I'd use it so much more.

  4. Because its not a software issue, it's a human social cooperation issue.

    Companies don't want to support useful APIs for interoperability so its just easier to have an LLM bruteforce problems using the same interface that humans use.

  5. really nice post, will share!
  6. Is flash/flash lite releasing alongside pro? Those two tiers have been incredible for the price since 2.0, absolute workhorses. Can't wait for 3.0.
  7. https://saucenao.blogspot.com/2021/04/recent-events.html

    Mildly related incident where a Canadian child protection agency uploads csam onto a reverse image search engine and then reports the site for the temporarily stored images.

  8. > I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

    For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.

    It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.

    I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.

  9. Groq does quantise. Look at this benchmark from moonshotai for K2 where they compare their official implementation to third party providers.

    https://github.com/MoonshotAI/K2-Vendor-Verifier

    It's one of the lowest rated on that table.

  10. Rumour is a release on the 22nd I believe
  11. Accessing services from the UK without handing over your personal ID to a service that will inevitably get hacked.

    This happened to discord literally a few days ago.

  12. The voice quality in the generated vids is surprisingly awful.
  13. can you provide an example please? The docs suggest that propertyOrdering can only be a list[str].
  14. that only works for the outer level, not for any nested fields
  15. what the fuck is this slop? Don't name your shit grifts after (the codenames of) actual highly anticipated models.
  16. The french comic pirate scene has an interesting rule where they keep a ~6 month time lag on what they release. The scene is small enough that the rule generally works.

    It's a really good trade-off. I would never have gotten into these comics without piracy but now if something catches my eye, I don't mind buying on release (and stripping the DRM for personal use).

    Most of my downloading is closer to collecting/hoarding/cataloguing behaviour but if I fully read something I enjoy, I'll support the author in some way.

  17. Yeah that was the only exciting part of the announcement for me haha. Can't wait to play around with it.

    I'm already running into a bunch of issues with the structured output APIs from other companies like Google and OpenAI have been doing a great job on this front.

  18. OpenAI's work on Dota was also very important for funding
  19. Are us plebs allowed to monitor the CoT tokens we pay for, or will that continue to be hidden on most providers?
  20. Merge comments? https://www.hackerneue.com/item?id=44331150

    I'm really getting bored of Anthropic's whole song and dance with 'alignment'. Krackers in the other thread explains it in better words.

  21. Agree completely. When I read the Gemma 3 paper (https://arxiv.org/html/2503.19786v1) and saw an entire section dedicated to measuring and reducing the memorization rate I was annoyed. How does this benefit end users at all?

    I want the language model I'm using to have knowledge of cultural artifacts. Gemma 3 27B was useless at a question related to grouping Berserk characters by potential baldurs gate 3 classes; Claude did fine. The methods used to reduce memorisation rate probably also deteriorate performance in some other ways that don't show up on benchmarks.

  22. > 12GB vram

    waste of effort, why would you go through the trouble of building + blogging for this?

  23. > Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.

    Extremely cringe behaviour. Raw CoTs are super useful for debugging errors in data extraction pipelines.

    After Deepseek R1 I had hope that other companies would be more open about these things.

  24. Fantastic. I wonder how many random technical info is buried in these servers. I hate what it's done for game modding.
  25. Brand safety. Journalists would write articles about the models being 'dangerous'.
  26. The linked blog is down. But agreed, I would especially like to see this particular thing fixed.

    > Property ordering

    > When you're working with JSON schemas in the Gemini API, the order of properties is important. By default, the API orders properties alphabetically and does not preserve the order in which the properties are defined (although the Google Gen Al SDKs may preserve this order). If you're providing examples to the model with a schema configured, and the property ordering of the examples is not consistent with the property ordering of the schema, the output could be rambling or unexpected.

  27. Thanks for the detailed comment.

    I had no idea that fine tuning for adding information is viable now. Last I checked (year+ back) it seemed to not work well.

  28. Actually you can't do "system" roles at all with OpenAI models now.

    You can use the "developer" role which is above the "user" role but below "platform" in the hierarchy.

    https://cdn.openai.com/spec/model-spec-2024-05-08.html#follo...

  29. It's a race to the bottom for pricing. They can't do shit. Even if the American companies colluded to stop competing and raise prices, Chinese providers will undermine that.

    There is no moat. Most of these AI APIs and products are interchangeable.

  30. I haven't bothered with video gen because I'm too impatient but isn't Wan pretty good too on regular hardware?

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal