Preferences

> When it's already faster than I can absorb the response

Streaming a response from a chatbot is only one use-case of LLMs.

I would argue the most interesting applications do not fall into this category.


Number of different use cases (categories) I'd agree; I'm not so sure about use (volume)…

…not yet anyway. Fast moving area, lots of blue water outside the chat interface.

Name one use case where there is a difference between latency of 200 t/s (fireworks.ai mixtral model) and 500 t/s (groq mixtral)? Not throughput and not time to first token, but latency.

Groq model shines at latency, not at the other two.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal