Preferences

The big players use parallel processing of multiple users to keep the GPUs and memory filled as much as possible during the inference they are providing to users. They can make use of the fact that they have a fairly steady stream of requests coming into their data centers at all times. This article describes some of how this is accomplished.

https://www.infracloud.io/blogs/inference-parallelism/


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal