Preferences

terhechte parent
Would this also be possible with other LLM engines / GPUs? E.g. Llama / Apple Silicon or Radeon?

saagarjha
Yeah, none of this is specific to CUDA (though the relative latencies might be different).

This item has no comments currently.