It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.
In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.
In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.