Comment by totalhack - Hacker Neue

totalhack Feb 19, 2024 parent

Thanks! I did notice the queue count showing up occasionally but not every time. Maybe someone could repeat the test who has access without the queue so we can get an understanding of the potential latency once scaled and geo-distributed. What I'm really trying to understand is time to first token output actually faster than GPT 3.5 via API or just the rate of token output once it begins.

tome Feb 19, 2024

I don't know about GPT 3.5 specifically, but on this independent benchmark (LLMPerf) Groq's time to first token is also lowest:

https://github.com/ray-project/llmperf-leaderboard?tab=readm...

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous