Comment by YetAnotherNick

YetAnotherNick Apr 8, 2024 parent

It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.

In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.

This item has no comments currently.

It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled. Please enable JavaScript to use this site (or just go read Hacker News).

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous